0710.1872/ms.tex
1: %%
2: %% Beginning of file 'sample.tex'
3: %%
4: %% Modified 2005 December 5
5: %%
6: %% This is a sample manuscript marked up using the
7: %% AASTeX v5.x LaTeX 2e macros.
8: 
9: %% The first piece of markup in an AASTeX v5.x document
10: %% is the \documentclass command. LaTeX will ignore
11: %% any data that comes before this command.
12: 
13: %% The command below calls the preprint style
14: %% which will produce a one-column, single-spaced document.
15: %% Examples of commands for other substyles follow. Use
16: %% whichever is most appropriate for your purposes.
17: %%
18: \documentclass[12pt,preprint]{aastex}
19: 
20: %% manuscript produces a one-column, double-spaced document:
21: 
22: %\documentclass[manuscript]{aastex}
23: %\documentclass[preprint]{aastex}
24: 
25: %%\usepackage{amsmath}
26: %%\usepackage{amssymb}
27: %%\usepackage{graphicx}
28: 
29: %% preprint2 produces a double-column, single-spaced document:
30: 
31: %% \documentclass[preprint2]{aastex}
32: 
33: %% Sometimes a paper's abstract is too long to fit on the
34: %% title page in preprint2 mode. When that is the case,
35: %% use the longabstract style option.
36: 
37: %% \documentclass[preprint2,longabstract]{aastex}
38: 
39: %% If you want to create your own macros, you can do so
40: %% using \newcommand. Your macros should appear before
41: %% the \begin{document} command.
42: %%
43: %% If you are submitting to a journal that translates manuscripts
44: %% into SGML, you need to follow certain guidelines when preparing
45: %% your macros. See the AASTeX v5.x Author Guide
46: %% for information.
47: 
48: \newcommand{\vdag}{(v)^\dagger}
49: %\newcommand{\myemail}{skywalker@galaxy.far.far.away}
50: \newcommand{\etal}{{et al.}}
51: \newcommand{\kpc}{$\, {\rm kpc}$}
52: \newcommand{\kms}{$\, {\rm km\,s^{-1}}$}
53: \newcommand{\lsun}{$L_{\odot}$}
54: \newcommand{\msun}{\,$M_{\odot\,}$}
55: \newcommand{\sch}{Schwarzschild\,\,}
56: \newcommand{\ml}{$\Upsilon$}
57: \newcommand{\grad}{^{\circ}}
58: %\newcommand{\vect}[1]{\ensuremath{\mbox{\boldmath $#1$}}}
59: %\newcommand{\isot}{$\rm{^{12}C/^{13}C}\,\,$}
60: %\newcommand{\no}{$\rm{^{14}N/^{16}O}\,\,$}
61: %\newcommand{\feh}{${\rm [Fe/H]\,\,}$}
62: %\newcommand{\sgb}{$\rm SGB\,\,$}
63: %\newcommand{\arcsec}{{''\hskip-3pt .}}
64: 
65: %\def\plotone#1{}
66: 
67: %% You can insert a short comment on the title page using the command below.
68: 
69: %\slugcomment{To be submitted to The Astrophysical Journal}
70: 
71: %% If you wish, you may supply running head information, although
72: %% this information may be modified by the editorial offices.
73: %% The left head contains a list of authors,
74: %% usually a maximum of three (otherwise use et al.).  The right
75: %% head is a modified title of up to roughly 44 characters.
76: %% Running heads will not print in the manuscript style.
77: 
78: \shorttitle{Schwarzschild models of discrete data}
79: \shortauthors{Kleyna et al.}
80: 
81: %% This is the end of the preamble.  Indicate the beginning of the
82: %% paper itself with \begin{document}.
83: 
84: \begin{document}
85: 
86: %% LaTeX will automatically break titles if they run longer than
87: %% one line. However, you may use \\ to force a line break if
88: %% you desire.
89: 
90: 
91: \title{Constraining the Mass Profiles of Stellar Systems: \sch Modeling of
92: Discrete Velocity Datasets}
93: 
94: %% Use \author, \affil, and the \and command to format
95: %% author and affiliation information.
96: %% Note that \email has replaced the old \authoremail command
97: %% from AASTeX v4.0. You can use \email to mark an email address
98: %% anywhere in the paper, not just in the front matter.
99: %% As in the title, use \\ to force line breaks.
100: 
101: \author{Julio Chanam\'e\altaffilmark{1}, Jan Kleyna\altaffilmark{2}, \& Roeland van der Marel\altaffilmark{1}}
102: \altaffiltext{1}{Space Telescope Science Institute, 3700 San Martin Dr., Baltimore, MD 21218}
103: \altaffiltext{2}{Institute for Astronomy, University of Hawaii, 2680 Woodlawn Drive, Honolulu, HI 96822}
104: 
105: 
106: 
107: %% Notice that each of these authors has alternate affiliations, which
108: %% are identified by the \altaffilmark after each name.  Specify alternate
109: %% affiliation information with \altaffiltext, with one command per each
110: %% affiliation.
111: 
112: %\altaffiltext{1}{Visiting Astronomer, Cerro Tololo Inter-American Observatory.
113: %CTIO is operated by AURA, Inc.\ under contract to the National Science
114: %Foundation.}
115: %\altaffiltext{2}{Society of Fellows, Harvard University.}
116: %\altaffiltext{3}{present address: Center for Astrophysics,
117: %    60 Garden Street, Cambridge, MA 02138}
118: %\altaffiltext{4}{Visiting Programmer, Space Telescope Science Institute}
119: %\altaffiltext{5}{Patron, Alonso's Bar and Grill}
120: 
121: %% Mark off your abstract in the ``abstract'' environment. In the manuscript
122: %% style, abstract will output a Received/Accepted line after the
123: %% title and affiliation information. No date will appear since the author
124: %% does not have this information. The dates will be filled in by the
125: %% editorial office after submission.
126: 
127: \begin{abstract}
128: 
129: We present a new \sch orbit-superposition code that is designed to
130: model discrete datasets composed of velocity measurements of
131: individual kinematic tracers in a dynamical system. This constitutes
132: an extension of previous implementations that can only address
133: continuous data in the form of (the moments of) velocity
134: distributions, thus avoiding potentially important losses of
135: information due to data binning. Furthermore, the code can handle any
136: combination of available velocity components, i.e., only line-of-sight
137: velocities, only proper motions, or a combination of both. It can also
138: handle a combination of discrete and continuous data. The code
139: determines the combination of orbital mass weights (representing the
140: distribution function) as a function of the three integrals of motion
141: $E,L_z,$ and $I_3$ that best reproduces, in a maximum-likelihood
142: sense, the available kinematic and photometric observations in a given
143: axisymmetric gravitational potential. The overall best fit is the one
144: that maximizes the likelihood over a parameterized set of trial
145: potentials. The fully numerical approach ensures considerable freedom
146: on the form of the distribution function $f(E,L_z,I_3)$. This allows a
147: very general modeling of the orbital structure, thus avoiding
148: restrictive assumptions about the degree of (an)isotropy of the
149: orbits. We describe the implementation of the discrete code and
150: present a series of tests of its performance based on the modeling of
151: simulated (i.e., artificial) datasets generated from a known
152: distribution function. We explore pseudo-datasets with varying degrees
153: of overall rotation and different inclinations on the plane of the
154: sky, and study the results as a function of relevant observational
155: variables such as the size of the dataset and the type of velocity
156: information available. We find that the discrete \sch code recovers
157: the original orbital structure, mass-to-light ratio, and inclination
158: of the input datasets to satisfactory accuracy, as quantified by
159: various statistics. The code will be valuable, e.g., for modeling
160: stellar motions in Galactic globular clusters, and modeling the
161: motions of individual stars, planetary nebulae, or globular clusters
162: in nearby galaxies. This can shed new light on the total mass
163: distributions of these systems, with central black holes and dark
164: matter halos being of particular interest.
165: 
166: \end{abstract}
167: 
168: %% Keywords should appear after the \end{abstract} command. The uncommented
169: %% example has been keyed in ApJ style. See the instructions to authors
170: %% for the journal to which you are submitting your paper to determine
171: %% what keyword punctuation is appropriate.
172: 
173: \keywords{stellar dynamics -- galaxies: kinematics and dynamics --
174: dark matter -- galaxies: halos -- methods: numerical}
175: 
176: 
177: 
178: %% From the front matter, we move on to the body of the paper.
179: %% In the first two sections, notice the use of the natbib \citep
180: %% and \citet commands to identify citations.  The citations are
181: %% tied to the reference list via symbolic KEYs. The KEY corresponds
182: %% to the KEY in the \bibitem in the reference list below. We have
183: %% chosen the first three characters of the first author's name plus
184: %% the last two numeral of the year of publication as our KEY for
185: %% each reference.
186: 
187: 
188: %% Authors who wish to have the most important objects in their paper
189: %% linked in the electronic edition to a data center may do so by tagging
190: %% their objects with \objectname{} or \object{}.  Each macro takes the
191: %% object name as its required argument. The optional, square-bracket 
192: %% argument should be used in cases where the data center identification
193: %% differs from what is to be printed in the paper.  The text appearing 
194: %% in curly braces is what will appear in print in the published paper. 
195: %% If the object name is recognized by the data centers, it will be linked
196: %% in the electronic edition to the object data available at the data centers  
197: %%
198: %% Note that for sources with brackets in their names, e.g. [WEG2004] 14h-090,
199: %% the brackets must be escaped with backslashes when used in the first
200: %% square-bracket argument, for instance, \object[\[WEG2004\] 14h-090]{90}).
201: %%  Otherwise, LaTeX will issue an error. 
202: 
203: \section{Introduction}
204: \label{sec.intro}
205: 
206: The study of the internal dynamics of stellar systems plays an
207: essential role in astronomy.  From the observed positions and
208: velocities of the stars in galaxies and globular clusters it is
209: possible to infer their total (dark+luminous) mass distribution,
210: which, in particular, provides information on the presence and
211: properties of dark halos and massive black holes. In turn, this
212: structural knowledge constrains theories for the formation and
213: evolution of these systems.
214: 
215: The dynamical state of a stellar system is determined by its phase
216: space distribution function, $f({\vec r}, {\vec v})$, which counts the
217: stars as a function of position ${\vec r}$ and velocity ${\vec v}$.
218: Typically, however, only three of the six phase-space coordinates are
219: available observationally: the projected sky position $(x',y')$, and
220: the velocity $v_{z'}$ along the line of sight (LOS). Proper motion
221: observations can provide the additional velocities $(v_{x'},v_{y'})$,
222: but such data are generally not available (with the notable exception
223: of some Galactic globular clusters). To make progress with the limited
224: information available, the dynamical theorist is often forced to make
225: simplifying assumptions about geometry (e.g., that the system is
226: spherical) or about the velocity distribution (e.g., that it is
227: isotropic). Such assumptions can have strong effects on the inferred
228: mass distribution (\citealt{bin82}). To obtain the most accurate
229: results it is therefore important to make models that are as general
230: as possible. Of particular importance for collisionless, unrelaxed
231: systems such as galaxies is to constrain the velocity anisotropy using
232: available data, rather than to assume it a priori.
233: 
234: In a collisionless system the distribution function satisfies the
235: collisionless Boltzmann equation. Analytical methods to find solutions
236: of this equation usually rely on the Jeans Theorem, which states that
237: the distribution function must depend on the phase-space coordinates
238: through integrals of motion (quantities that are conserved along a
239: stellar orbit). In a spherical system all integrals are known
240: analytically, namely, the energy $E$ and the components of the angular
241: momentum vector ${\vec L}$. Analytical models for spherical systems
242: are therefore fairly easily constructed. In an axisymmetric system
243: things are more complicated (e.g., \citealt{bt87,mer99}). Only two
244: integrals are known analytically, $E$ and the vertical component
245: $L_{\rm z}$ of the angular momentum vector\footnote{We adopt the
246: notation in which $(x,y,z)$ denote the coordinates intrinsic to the
247: axisymmetric stellar system, with the plane $x-y$ being the equatorial
248: plane, and $z$ the symmetry axis. These relate via the inclination $i$
249: to the observable coordinates $(x',y')$ on the plane of the sky
250: (aligned, respectively, along the projected major and minor axes of
251: the stellar system), and $z'$ the line-of-sight direction, positive in
252: the direction away from us.}, but there is generally a third integral
253: for which no analytical expression exists. Therefore, it is not
254: generally possible to construct an axisymmetric model
255: analytically. The special class of so-called `two-integral'
256: ($f=f(E,L_z)$) models (e.g., \citealt{bat93,deh94,ver02}) has its uses
257: (e.g., \citealt{mag98,vdm06}), but these have an isotropic velocity
258: distribution in their meridional plane, which need not be a good fit
259: to real dynamical systems.
260: 
261: The most practical way to model a general axisymmetric system is to do
262: it numerically. While a few methods exist to do this (e.g.,
263: \citealt{m2m,nmagic}), the most common approach uses Schwarzschild's
264: (1979) method. One starts with a trial guess for the gravitational
265: potential $\Psi$ and then numerically calculates an orbit library that
266: samples integral space in some complete and uniform way. The orbits
267: are integrated for several hundred orbital periods, and the
268: time-averaged intrinsic and projected properties (density, LOS
269: velocity, etc.) are stored as the integration progresses. The
270: construction of a model consists of finding a weighted superposition
271: of the orbits that: (1) reproduces the observed stellar or surface
272: brightness distribution on the sky; and (2) reproduces all available
273: kinematical data to within the observational error bars. Additional
274: constraints can be added to enforce that the distribution function in
275: phase space be smooth and reasonably well behaved, e.g., through
276: regularization or by requiring maximum entropy.
277: 
278: Several axisymmetric Schwarzschild codes have been developed in the
279: last decade (e.g., \citealt{vdm98,cre99,geb00,val04,tho04}). These
280: codes deal with the situation in which information on the
281: line-of-sight velocity distribution (LOSVD) is available for a set of
282: positions on the projected plane of the sky. This is the case, e.g.,
283: when the kinematical data are from long-slit or integral-field
284: spectroscopic observations of unresolved galaxies. The optimization
285: problem for such data can be reduced to a linear matrix equation for
286: which one needs to find the least-squares solution with non-negative
287: weights \citep{rix97}. One dimension of the matrix corresponds to the
288: number of orbits in the library, while the other corresponds to the
289: number of (luminosity, kinematical and regularization) constraints
290: that must be reproduced. Both dimensions are typically in the range
291: $10^3$--$10^4$. Nonetheless, efficient numerical algorithms exist to
292: find the solution, which yield the orbital and the velocity
293: distribution of the model, as well as the $\chi^2$ of the fit to the
294: kinematical data. The procedure must then be iterated with different
295: gravitational potentials, to determine the potential that provides the
296: overall best $\chi^2$. The existing codes have been used and tested
297: extensively (e.g.,
298: \citealt{cre00,cap02,cap06,geb03,ben05,dav06}). Some questions remain,
299: e.g., about the importance of smoothing in phase space, the exact
300: meaning of the confidence regions determined using $\Delta \chi^2$
301: contours, and, in some situations, valid concerns have been raised
302: regarding whether the available data contain enough information so as
303: to warrant the conclusions of the \sch modeling
304: \citep{val04,cre04,kra05}. Nevertheless, on the whole Schwarzschild
305: codes have now been established as an accurate and versatile tool to
306: study a wide range of dynamical problems.
307: 
308: A disadvantage of the existing codes is that they cannot be easily
309: applied to the large class of problems in which the kinematical
310: observations come in the form of discrete velocity measurements,
311: rather than as LOSVDs. This is encountered, e.g., when modeling the
312: dynamics of galaxies at large radii, where the low-surface brightness
313: prevents integrated-light spectroscopy. The only available data are
314: then often of a discrete nature, e.g., via the LOS velocities of
315: individual stars in galaxies of the Local Group (e.g.,
316: \citealt{bill00,jan01,jan02,lok02,wil04,lok05,wal06,geh06}), or via
317: planetary nebulae (e.g., \citealt{dou02,rom03,teo05}) and globular
318: clusters (e.g., \citealt{cote01,tom04}) surrounding giant
319: ellipticals. The kinematical data available for clusters of galaxies,
320: consisting of redshifts for individual galaxies, are of a similarly
321: discrete nature (e.g., \citealt{lok03}). The typical datasets in all
322: these cases consist of tens to hundreds of LOS velocities. Galactic
323: globular clusters constitute another class of object for which
324: kinematical data is often available only as discrete measurements,
325: rather than in the form of LOSVDs. From ground-based observations,
326: data sets of individual LOS velocities can be available for up to
327: thousands of stars in these systems (e.g.,
328: \citealt{sun96,may97,rei06}), and for $\omega$ Cen it has been
329: possible to assemble large samples of proper motions as well
330: \citep{vleu00}. With the capabilities of {\it HST}, accurate proper
331: motion data sets with up to $\sim 10^4$ stars are now becoming
332: available for several more Galactic globular clusters (e.g.,
333: \citealt{mcn03,mcl06}). 
334: 
335: Note that discrete datasets do not necessarily provide better or worse
336: information than datasets obtained from integrated-light
337: measurements. Both types of data have their advantages and
338: disadvantages. For discrete datasets, for example, interloper
339: contamination can be a problem (see also the end of
340: Section~\ref{sec:logL} below). By contrast, for integrated-light
341: measurements, it is often difficult to constrain the wings of the
342: LOSVD due to uncertainties associated with continuum
343: subtraction. Which type of data is most appropriate and most easily
344: obtained depends on the specific object under study. This is therefore
345: not a question that we address in this paper.  Instead, we focus on
346: the issue of how to best analyze discrete data, if that happens to be
347: what is available.
348: 
349: Analyses of discrete datasets have often been more simplified than the
350: analyses that are now common for integrated-light data. For example,
351: the observations are analyzed using the Jeans equations (e.g.,
352: \citealt{ger02,lok03,cote03,dou07}), often with the help of data
353: binning to calculate rotation velocity and velocity dispersion
354: profiles (see, however, the ``spherical'' Schwarzschild models of M87
355: of \citealt{rom01}). The disadvantage of such an approach is that not
356: all the information content of the data is used, including information
357: on deviations of the velocity histograms from a Gaussian. Such
358: deviations are important because they constrain the velocity
359: dispersion anisotropy of the system (e.g.,
360: \citealt{vdm93,ger93,ger98}). This anisotropy is an important
361: ingredient in some existing controversies, e.g. regarding the presence
362: of dark halos around elliptical galaxies \citep{rom03,dek05}. Loss of
363: information can be avoided when large numbers of datapoints are
364: available, as is often the case for globular clusters. It is then
365: possible to create velocity histograms for binned areas on the
366: projected plane of the sky, after which analysis can be done with
367: existing Schwarzschild codes (e.g., \citealt{bos06}). While this is
368: possible for large datasets, such an approach is not viable for the
369: more typical, smaller datasets that are often available. The
370: availability of Schwarzschild codes that can fully exploit the
371: information content of such smaller datasets would therefore be
372: valuable to advance this subject.
373: 
374: Motivated by these considerations we set out to adapt our existing
375: Schwarzschild code \citep{vdm98} to deal with discrete datasets. This
376: does not constitute a trivial change, since it changes the constrained
377: superposition procedure from a linear matrix problem to a more
378: complicated maximum likelihood one. For each observed velocity of a
379: particle in the system the question becomes: what is the probability
380: that this velocity would have been observed if the model is correct?
381: The overall likelihood of the data, given a trial model, is the
382: product of these probabilities for all observations. Such likelihood
383: problems have previously been solved for spherical systems
384: \citep{mer93,vdm00,wu06} and the special class of axisymmetric
385: $f(E,L_z)$ systems \citep{mer97,wu07}. However, for the axisymmetric
386: Schwarzschild modeling approach the problem corresponds to finding the
387: minimum of a function in a space with a dimension of
388: $10^3$--$10^4$. We show in this work, via the \sch modeling of
389: simulated datasets, that this problem can indeed be solved
390: successfully and efficiently. Moreover, we follow \cite{glenn06} and
391: implement in our new code the ability to calculate and fit proper
392: motions in addition to LOS velocities. Applications of the code to
393: real datasets will be presented in forthcoming papers.
394: 
395: The structure of the paper is as follows. In Section \ref{sec:logL} we
396: phrase the new problem of fitting a \sch model to a dataset of
397: discrete velocities (of one, two, or three dimensions) of individual
398: kinematic tracers in terms of a likelihood formalism. Section
399: \ref{sec:code} describes the implementation of the discrete fitting
400: procedure into our existing \sch code. At the same time, we summarize
401: here the major steps involved in the construction of the probability
402: matrix that describes the likelihood of a given kinematic data point
403: belonging to some particular orbit of the library. We then present in
404: Section \ref{sec:tools} sets of simulated data that we use for the
405: purpose of testing the performance of the discrete \sch code. We also
406: describe the known input distribution functions from which these data
407: were drawn. The application of the code to the simulated datasets is
408: presented in Section \ref{sec:tests}. We present a thorough analysis
409: of the accuracy with which our discrete \sch code recovers the known
410: distribution function, mass-to-light ratio and inclination used to
411: generate the simulated data. Finally, in Section \ref{sec:end} we
412: summarize our findings and present our conclusions.
413: 
414: \section{Linear and non-linear constraints in the likelihood formalism}
415: \label{sec:logL}
416: 
417: In the \sch scheme the properties of every orbit $j$ in the orbit
418: library are computed and stored. The modeling consists in finding the
419: superposition of orbital weights $a_j^2$, i.e., the fraction of
420: particles in the system residing in each orbit, that best reproduces
421: some set of constraints. The weights are written as squares to ensure
422: that they never become negative. Linear constraints are of the form
423: %
424: \begin{equation}
425: \label{eq.constraint}
426:   {\gamma_k}^* = {\gamma_k} \pm {\sigma_k} , \quad k=1 \ldots M .
427: \end{equation}
428: %
429: Here ${\gamma_k}$ is a constraint value that needs to be reproduced,
430: ${\sigma_k}$ is its uncertainty, and ${\gamma_k}^*$ is its model
431: prediction
432: %
433: \begin{equation}
434: \label{eq.gamma.def}
435:   {\gamma_k}^* = \sum_j B_{kj} a_j^2/\sum_j a_j^2 .
436: \end{equation}
437: %
438: The matrix $B_{kj}$ represents here, for orbit $j$, the probability
439: distribution corresponding to the constraint $\gamma_k$. The
440: constraints are generally one of the following: (a) the integrated
441: light (surface brightness) of a stellar population in some aperture
442: number in the projected plane of the sky, necessary to reproduce an
443: observational measurement of the surface brightness; (b) the mean LOS
444: velocity, velocity dispersion, or for data of sufficient quality, a
445: higher-order Gauss-Hermite moment in some aperture number in the
446: projected plane of the sky, necessary to reproduce an observational
447: measurement of the stellar kinematics; (c) the integrated mass in some
448: meridional $(R,z)$ plane grid point, necessary to provide a consistent
449: model; (d) a combination of distribution function moments in some
450: meridional $(R,z)$ plane grid point, if a model with a particular
451: dynamical structure is desired (e.g., one may want a model with $\rho
452: (\overline{v_R^2}-\overline{v_z^2})$ equal to zero in order to
453: simulate a two-integral $f(E,L_z)$ model); (e) a combination of orbit
454: weights, if regularization constraints are desired to enforce
455: smoothness of the model in phase space (e.g., one can set the N-th
456: order divided difference of adjacent orbit weights to zero, with an
457: uncertainty $\Delta {\gamma_k}$ that measures the desired amount of
458: smoothing).
459: 
460: It is natural to choose the best-fitting model to be the one that
461: produces the maximum likelihood. To determine the likelihood we need
462: to write down an expression for the probability of measuring
463: $\gamma_k$ among all its possible values. To do this, we recall that
464: any model is not an attempt to reproduce a set of observations to
465: infinite accuracy, but instead to do it within the uncertainty
466: $\sigma_k$. For observational constraints, such as those in (a) and
467: (b) above, $\sigma_k$ is equal to the measurement uncertainty. For
468: other constraints, such as those in (c)-(e) above, $\sigma_k$ can be
469: used as a forcing parameter that compels how accurately the likelihood
470: needs to peak around a particular value of $\gamma_k$. If one assumes
471: that these uncertainties have a normal (Gaussian) distribution, then
472: the probability we are interested in is given by
473: %
474: \begin{eqnarray}
475: \label{eq.non.linear.term}
476: P(\gamma_k) &=& 
477: {1\over \sqrt{2\pi} {\sigma_k}} 
478: \exp\left[-{1\over 2 {\sigma_k}^2}
479: \left(\gamma_k- {\gamma_k}^* \right)^2 \right] .
480: %\\
481: %&=&{1\over \sqrt{2\pi} {\sigma_k}} 
482: %\exp\left[-{1\over 2 {\sigma_k}^2}
483: %\left(\gamma_k- {\sum_j  B_{kj} a_j^2 \over\sum_j a_j^2} \right)^2 
484: %\right] . \nonumber
485: \end{eqnarray}
486: %
487: The combined probability for the simultaneous occurrence of all $M$
488: linear constraints is then given by the product of the single
489: probabilities, $L_{\rm linear} = \prod_k P(\gamma_k)$. Using equation
490: (\ref{eq.non.linear.term}), the logarithm of this linear part of the
491: likelihood is therefore
492: 
493: \begin{equation}
494: \label{eq:loglinear}
495:   \ln L_{\rm linear} = - \sum_{k=1}^M  \ln {\sqrt{2\pi} \sigma_k} \, - \,
496:           \sum_{k=1}^M \left(
497:                   \frac{\gamma_k-\gamma_k^*}{\sqrt{2}\,\sigma_k}\right)^2 . 
498: \end{equation}
499: %
500: The first sum on the right-hand side of this expression does not
501: depend on the orbital weights $a_j^2$ and, therefore, does not affect
502: the likelihood maximization. The second term has the exact form of the
503: $\chi^2$ statistic. Maximizing the likelihood is therefore equivalent
504: to the minimization of this $\chi^2$. This can be done by finding the
505: solution of the set of equations~(\ref{eq.constraint}) and
506: (\ref{eq.gamma.def}), which can be rewritten as an overdetermined
507: matrix equation. This matrix equation can be solved with the use of
508: standard non-negative least-squares (NNLS) algorithms (see
509: \citealt{rix97} for a detailed description).
510: 
511: In the case of discrete data, however, the introduction of constraints
512: of a ``non-linear'' type is inevitable in order to adequately exploit
513: the entire information content available, avoiding restrictive
514: simplifications and loss of information due to binning.  This occurs
515: because the individual probabilities do not necessarily have the
516: simple, Gaussian form of equation (\ref{eq.non.linear.term}).  The
517: procedure for finding the maximum likelihood then cannot be cast as
518: the solution of a linear matrix equation anymore.
519: 
520: Suppose we have a kinematic dataset consisting of discrete
521: measurements which we are trying to model using the \sch
522: technique. Let $P_j({\bf q})$ be the phase-space probability
523: distribution of any given orbit, properly averaged azimuthally, and
524: normalized such that $\int{P_j({\bf q}){\rm d^3}r\,{\rm d^3}v} = 1$.
525: We use ${\bf q}$ to denote a vector of up to six Euclidean spatial and
526: velocity coordinates. Whenever ${\bf q}$ is shorter than 6 elements,
527: it is understood that the distribution has been marginalized over the
528: missing dimensions. Then the total probability of drawing a particle
529: from a superposition of orbits representing the whole system is
530: %
531: \begin{equation}
532: \label{eq.prob.q}
533:   P({\bf q}) = \sum_j a^2_j P_j({\bf q}) /\sum_j a_j^2 .
534: \end{equation}
535: 
536: We now need to consider the total probability of the ensemble of $N$
537: particles with kinematic information that constitute our discrete
538: dataset. Before this, however, it is necessary to make the
539: distinction, in the language of probabilities, between the possible
540: modes of sampling of the tracers available in a system of particles.
541: The two main possibilities depend on whether the particles are
542: randomly or non-randomly drawn from their spatial distribution, and we
543: may refer to these, respectively, as random positional sampling and
544: incomplete positional sampling. Additionally, particles may be drawn
545: with or without velocity information, thus adding up to a total of
546: four possibilities. The case with incomplete positional sampling and
547: no velocity information, however, does not provide any useful
548: constraint to the analysis and therefore we restrict the discussion to
549: the remaining three cases.
550: 
551: For particles drawn randomly from the spatial distribution with no
552: velocity information, the probability $P({\bf q})$ is
553: 
554: \begin{equation}
555: \label{eq.prob.r}
556: P({\bf q}) = P({\bf r}) = \sum_j a^2_j P_j({\bf r}) /\sum_j a_j^2\,,
557: \end{equation}
558: 
559: \noindent where {\bf r} represents a 2 or 3 dimensional position. This
560: type of dataset could result from imaging of the resolved populations
561: of a stellar system, where the positional information could be used as
562: actual constraints. This would force the model to fit the underlying
563: spatial distribution of discrete tracers, instead of making use of a
564: parametrization of the (continuous) brightness profile of the system.
565: Of course, a dataset without velocity information cannot by itself
566: constrain the dynamical state or the mass of the system.
567: 
568: In the case of random positional sampling including velocity
569: information, particles are randomly drawn from both the spatial and
570: velocity distributions. In this case, $P({\bf q})$ has the form
571: 
572: \begin{equation}
573: \label{eq.prob.rv}
574: P({\bf q}) = P({\bf r},{\bf v}) = \sum_j a^2_j P_j({\bf r},{\bf v}) /\sum_j a_j^2\,,
575: \end{equation}
576: 
577: \noindent where ${\bf r}$ is the same as above and ${\bf v}$
578: represents a general 1, 2, or 3 dimensional velocity. This would be
579: the case when being able to obtain the velocities of particles in a
580: given field without introducing any spatial or velocity bias, such as
581: the proper motions of all stars (brighter than some magnitude limit)
582: in a sufficiently sparse stellar cluster, or when LOS velocities are
583: obtained for a complete (or possibly magnitude-limited) set of
584: globular clusters or planetary nebulae in a galaxy.
585: 
586: In contrast, having {\it incomplete} positional sampling means that
587: the particles are drawn from a velocity distribution, with {\it a
588: priori} fixed positions. This can occur, for example, when because of
589: the usually limited availability of telescope time and resources, LOS
590: velocities are measured only for stars within some distance from the
591: photometric major or minor axes of a galaxy, or when because of the
592: finite size of fibers in a fiber-fed spectrograph, not all the
593: potentially observable kinematic tracers in the field can be actually
594: acquired. Incomplete positional sampling also arises when, even though
595: particles can be randomly drawn spatially, this is the case only for a
596: limited area. This occurs, for example, when the observations have to
597: avoid the innermost regions of a galaxy or globular cluster, where,
598: because of crowding, stars cannot be individually resolved. In these
599: case, $P({\bf q})$ has the form
600: 
601: \begin{equation}
602: \label{eq.prob.rfix}
603: P({\bf q}) = P({\bf v}|{\bf r}) = \sum_j a^2_j P_j({\bf r}) P_j({\bf v}|{\bf r})
604:     /\sum_j a_j^2 P_j({\bf r}),
605: \end{equation}
606: 
607: \noindent where, rather than just $a_j^2$, the effective weights when
608: summing together the individual orbital distributions are
609: $a_j^2\,P_j({\bf r})$.
610: 
611: Once the individual probabilities for all possible cases of spatial
612: sampling that comprise the data have been properly assigned, we can
613: proceed to the construction of the total probability of observing the
614: entire dataset. Let $N_1$ and $N_2$ be the number of observational
615: data points obtained under the mode of random positional sampling
616: without and with velocity information, respectively, and $N_3$ the
617: number of data points obtained with incomplete positional sampling
618: with kinematic information. Then, the total probability is simply the
619: product of the individual probabilities, with logarithm given by $\ln
620: L_{\rm discrete} = \sum_{i=1}^{N_1}\ln P({\bf r_i}) +
621: \sum_{i=1}^{N_2}\ln P({\bf r_i},{\bf v_i}) + \sum_{i=1}^{N_3}\ln
622: P({\bf v_i}|{\bf r_i})$. Using equations (\ref{eq.prob.r}) to
623: (\ref{eq.prob.rfix}), and adopting the abbreviated notation
624: $p^{(r)}_{i,j} \equiv P_j({\bf r_i})$, $p^{(q)}_{i,j} \equiv P_j({\bf
625: r_i,v_i})$, and $p^{(v)}_{i,j} \equiv P_j({\bf v_i | r_i})$ (all known
626: for each orbit $j$ and particle $i$ from the orbit library
627: calculation; see \S\,3), the quantity to maximize becomes
628: %
629: \begin{eqnarray}
630: \label{eq.total.likelihood.1}
631: \ln L_{\rm discrete}
632: &=&  \sum_{i=1}^{N_1} \left(\ln \sum_j a^2_j p^{(r)}_{i,j} -  \ln \sum_j a^2_j\right) \\
633: &+&  \sum_{i=1}^{N_2} \left(\ln \sum_j a^2_j p^{(q)}_{i,j} -  \ln \sum_j a^2_j\right) \nonumber \\
634: &+&  \sum_{i=1}^{N_3} \left( \ln \sum_j a^2_j p^{(r)}_{i,j}  p^{(v)}_{i,j} -  \ln \sum_j a^2_j  p^{(r)}_{i,j} \right). \nonumber
635: \end{eqnarray}
636: %
637: Joining the results in equations (\ref{eq:loglinear}) and
638: (\ref{eq.total.likelihood.1}), the complete log-likelihood for a
639: general application of the \sch method, which is the full expression
640: to be maximized with respect to the orbital weights $a_j$, is the sum
641: of the log-likelihoods for linear and discrete constraints
642: %
643: \begin{eqnarray}
644: \label{eq.logL}
645: \ln L = \ln L_{\rm linear} + \ln L_{\rm discrete}.
646: \end{eqnarray}
647: %
648: Finding the maximum likelihood corresponds to finding the solution of
649: $\partial(\ln L)/\partial a_l$ = 0, for all $l$. Denoting $s=\sum_j
650: a_j^2$, the expression for the first derivative is
651: %
652: \begin{eqnarray}
653: \label{eq.1st.deriv}
654: {\partial \ln L \over \partial a_l} 
655: &=& -\,{2 a_l \over s}
656: \sum_{k=1}^M {1\over {\sigma_k}^2} 
657:        \left( \gamma_k - {\gamma_k}^* \right)
658:        \left( - B_{kl} + {{\gamma_k^*}}   \right)\\
659: && +\,2 a_l \sum_{i=1}^{N_1} \left( {p^{(r)}_{i,l}\over \sum_j  a^2_j p^{(r)}_{i,j}} - {1\over s} \right)\nonumber \\
660: && +\,2 a_l \sum_{i=1}^{N_2} \left( {p^{(q)}_{i,l}\over \sum_j  a^2_j p^{(q)}_{i,j}} - {1\over s} \right)\nonumber \\
661: && +\,2 a_l \sum_{i=1}^{N_3} \left( {p^{(r)}_{i,l} p^{(v)}_{i,l}\over 
662:      \sum_j  a^2_j p^{(r)}_{i,j}  p^{(v)}_{i,j}}
663:   -  { p^{(r)}_{i,l} \over \sum_j  a^2_jp^{(r)}_{i,j} } \right) . \nonumber
664: \end{eqnarray} 
665: 
666: One important question that remains is regarding the estimation of
667: confidence regions around the parameters of the best-fitting model,
668: i.e., the (statistical) uncertainties around the likelihood maximum in
669: the case of non-linear constraints. Recalling that maximizing $\ln L$
670: is equivalent to minimizing the quantity $\lambda = -2\ln L$, it is
671: easy to realize that, if the probabilities involved in equation
672: (\ref{eq.total.likelihood.1}) were all of Gaussian form, then
673: $\lambda$ would simply reduce to the well known $\chi^2$ statistic, as
674: we have already seen for the case with linear constraints in equation
675: (\ref{eq:loglinear}). When dealing with non-linear constraints,
676: however, the likelihood does not reduce to a simple $\chi^2$
677: form. Nevertheless, one still can use another well known theorem of
678: statistics which, used before by \citet{mer93a} and \citet{vdm00},
679: states that the ``likelihood-ratio'' statistic $\lambda - \lambda_{\rm
680: min}$ does tend to a $\chi^2$ statistic in the limit of large $N$,
681: with the number of degrees of freedom equal to the number of free
682: parameters that have not yet been varied and chosen so as to optimize
683: the fit. Therefore, the likelihood-ratio statistic reduces to the
684: $\Delta\chi^2$ statistic for $N\rightarrow\infty$, even though the
685: probabilities in equation (\ref{eq.total.likelihood.1}) are not all
686: individually Gaussian. Since in the present work we explore datasets
687: consisting of 100 kinematic measurements or more, the condition of
688: large $N$ should be reasonably fulfilled. Therefore, following the
689: likelihood-ratio statistic, we assume $\Delta\chi^2 = -2(\ln L-\ln
690: L_{\rm max})$, and compute the confidence regions around the best-fit
691: parameters in the usual way (e.g., \citealt{recipes}), i.e., with the
692: $1\sigma$ error for a single parameter corresponding to wherever
693: $\Delta\chi^2 = 1$, and so forth. Other approaches to quantify the
694: uncertainties exist as well, e.g., using Bayesian statistics, but
695: these are generally more difficult to implement (e.g.,
696: \citealt{mag06}).
697: 
698: The equations described above assume that any possible ``interloper''
699: contaminants have already been removed, and that the targets with
700: observed velocities that enter the likelihood equations all belong to
701: the system under study. For realistic datasets, contamination by
702: interlopers can certainly be a problem \citep{lok05}; i.e., targets
703: that happen to lie close to the line-of-sight of the stellar system
704: under study and are difficult to reject from the sample. However,
705: efficient interloper rejection schemes do exist for various types of
706: samples and these have been well-described in the literature
707: \citep{woj07a,woj07b}. Moreover, the use of empirically-calibrated
708: selection criteria (independent of the measured velocity) can produce
709: extraordinarily clean samples for kinematic analysis
710: \citep{gil06,gil07,sim07}. Either way, interloper rejection is best
711: discussed in the context of specific data sets. We therefore do not
712: discuss it further in the present paper. Interloper rejection for
713: discrete data sets can also be built in as part of the likelihood
714: analysis \citep{vdm00}, so a simple modification of the likelihood
715: equations given above could deal with interlopers explicitly. However,
716: we have not yet explored this in the present context.
717: 
718: \section{Computational Implementation}
719: \label{sec:code}
720: 
721: Given equations (\ref{eq.logL}) and (\ref{eq.1st.deriv}), fitting a
722: \sch model to the data requires the following two steps: (a)
723: determination of all the individual probabilities $p_{i,j}$ and matrix
724: elements $B_{kj}$, so that the only unknowns in equation
725: (\ref{eq.1st.deriv}) are the coefficients $a_l$; and (b) performing
726: the maximization of the total likelihood, i.e., finding the set of
727: orbital weights $a_l$ that satisfies $\partial(\ln L)/\partial a_l$ =
728: 0, for all $l$, and therefore best fits the available constraints. The
729: elements of the matrix $B_{kj}$, corresponding to the linear
730: constraints discussed in \S\,\ref{sec:logL}, are calculated in the
731: same way as in the old (continuous) implementation of the code, and
732: for them we refer to \citet{rix97}, \citet{vdm98} and
733: \citet{cre99}. In what follows we concentrate on the probabilities
734: $p_{i,j}$ associated with the discrete treatment that is the subject
735: of this work.
736: 
737: \subsection{Calculation of Individual Probabilities}
738: \label{sec:pij}
739: 
740: The matrix elements $p_{i,j}$ in equation (\ref{eq.1st.deriv}), which
741: keep track of the probability that orbit $j$ of the library would have
742: produced the measurement $i$ of the dataset (each $j$ corresponding to
743: some combination of the three integrals of motion $E$, $L_z$, and
744: $I_3$), are stored as the orbit in question is being computed. That
745: is, at every time step during the orbit integration, we check whether
746: the position and velocity along the orbit is consistent with any of
747: the observational datapoints.  To accomplish this, it is necessary to
748: implement some degree of {\it smoothing}, both in position and
749: velocity space, since otherwise the probability of having a particle
750: on an orbit at exactly the observed position and velocity would be
751: infinitesimally small.
752: 
753: Smoothing in the spatial coordinates is accomplished through the
754: definition of an {\it aperture} around the position of each particle
755: in the dataset, with the size of the aperture controlling the amount
756: of smoothing. The optimal aperture size will be somewhat dependent on
757: the sampling characteristics of the data. In general, apertures should
758: not be too small, or otherwise few time steps during orbit integration
759: will fall on any one of them. This would lead to large shot noise in
760: the computed probabilities $p_{i,j}$, unless the orbits are integrated
761: for very long times. Nor should the apertures be too large, so that
762: information on the orbital structure of the model is not erased by
763: excessive spatial smoothing. The choice of aperture shape is arbitrary
764: and a matter of numerical convenience. We adopt square apertures as in
765: previous implementations of the code (long-slit observations naturally
766: produce data for rectangular apertures), and set their sizes to a
767: user-supplied fraction of $R$, the radius in the projected plane at
768: the aperture's position.
769: 
770: Once the spatial apertures are defined, and every time the projected
771: position along the orbit being integrated falls within an aperture, we
772: need to keep track of whether the orbital velocity matches the
773: observed velocity. In the old (continuous) implementation, the LOSVD
774: was computed and stored for every orbit $j$ at each aperture $i$, with
775: the size of the bins in the histogram determining the amount of
776: smoothing in velocity space. In our discrete treatment of the problem,
777: $p_{i,j}$ would simply be the histogram value for the bin that
778: contains the observed velocity. A direct, though information ally
779: incomplete, generalization of this implementation to kinematical data
780: in three-dimensions would be to keep track of two additional
781: histograms at each aperture to account for $\mu_{x'}$ and
782: $\mu_{y'}$. This has been done by \citet{glenn06} and \citet{bos06},
783: who calculated moments of the three model velocity distributions and
784: fitted them to those obtained from binning LOS and proper-motion
785: observations of stars in $\omega$ Cen and M15, respectively (note that
786: these studies still handle the data in a continuous fashion, by
787: reducing the initially discrete datasets to binned velocity
788: distributions at a number of apertures on the sky, an approach only
789: possible thanks to the very large number of stars with measured
790: velocities in these systems).
791: 
792: While reproducing the three-dimensional mean velocities and
793: dispersions of the stars in a stellar system is already an improvement
794: over all previous implementations of the \sch technique, doing so is
795: nevertheless a simplification of the problem. The reason is that it
796: implicitly assumes that the three velocity components are independent
797: of each other, i.e., it does not account for the fact that there is a
798: velocity ellipsoid whose cross terms are, in the most general case,
799: not identical to zero. The most complete treatment would be to store a
800: cube with entries for all possible combinations of
801: $(\mu_{x'},\mu_{y'},v_{z'})$, and do this at each spatial aperture where there
802: is kinematical data available.  This implementation would be, however,
803: expensive in terms of memory storage and, moreover, not absolutely
804: necessary, simply because we are not interested in the entire
805: probability cube. Instead, we only need probabilities in the cases
806: when the model velocities are close to the observed ones. Thus, in the
807: framework of velocity histograms or full velocity cubes, and because
808: of the discrete nature of the data, the large majority of the bins or
809: entries would be filled with weights that do not affect the likelihood
810: in equation (\ref{eq.logL}).
811: 
812: Therefore, we adopt an approach in which, instead of storing velocity
813: histograms or cubes, every time an orbit $j$ passes through an
814: aperture $i$ with kinematical data, we add a Gaussian contribution to
815: $p_{i,j}$. This contribution is centered on the observed
816: (any-dimensional) velocity and has a dispersion that reflects the
817: measurement errors, and if desired, any amount of extra velocity
818: smoothing. Thus, denoting the actually measured components of the
819: particle's velocity in aperture $i$ as $v_{ik}$ and their associated
820: uncertainties as $e_{ik}$, with $k=1\ldots3$ corresponding to
821: $v_{x'}$, $v_{y'}$, and $v_{z'} = v_{\rm los}$, the multiplicative
822: contribution $w_{ik}^{(j)}$ to the probability has the form
823: %
824: \begin{eqnarray}
825: \label{eq.weights}
826: w_{ik}^{(j)} &=& 
827: {1\over \sqrt{2\pi\left(\xi_k^2 + e_{ik}^2\right)}} \exp{\left[-\,\frac{\left(v_{jk}-v_{ik}\right)^2}{2 \left(\xi_k^2+e_{ik}^2\right)}\right]},
828: \end{eqnarray}
829: %
830: where $v_{jk}$ is the component $k$ of the velocity of a test particle
831: on orbit $j$. The quantity $\xi_k$ is the numerical smoothing assigned
832: to velocity component $k$. Whenever a particular component $k$ of the
833: velocity is not available, we set $w_{ik}^{(j)} = 1$. Finally, in
834: order to account for the fact that we represent a continuous orbit by
835: a discrete sequence of time steps, we weigh this Gaussian factor by
836: multiplying it by the timestep $\Delta t_j$. Therefore, for every
837: orbit $j$, and every time the orbit integration falls within an
838: aperture, the probability is increased according to
839: %
840: \begin{eqnarray}
841: \label{eq.pij}
842: p_{i,j} = p_{i,j} + \Delta t_j \prod_{k=1}^3 w_{ik}^{(j)}.
843: \end{eqnarray}
844: %
845: When the integration of orbit $j$ is done, the $p_{i,j}$ elements for
846: all datapoints (apertures) are written to a file for later use by the
847: algorithm that performs the maximization of the likelihood. 
848: 
849: In most practical applications one can set $\xi_k = 0$, since the
850: error bars $e_{ik}$ on the data already provide sufficient natural
851: smoothing for numerical efficiency. We do this throughout the rest of
852: this paper. However, we note that there may be situations in which
853: non-zero $\xi_k$ may be beneficial. For example, if the observational
854: errors $e_{ik}$ are much smaller than the velocity dispersions
855: $\sigma_k$ of the system. It then takes very long integrations to beat
856: down the shot noise in the orbital distributions $p_{i,j}$. Addition
857: of a numerical smoothing $\xi_k$ with $e_{ik} \ll \xi_k \ll \sigma_k$
858: can then speed up the calculations without affecting the accuracy of
859: the results.
860:  
861: The approach of equations (\ref{eq.weights}) and (\ref{eq.pij})
862: assumes that the errors $e_{ik}$ for the different datapoints are
863: uncorrelated. Sometimes this is not true, as in the case of the proper
864: motions of stars in the globular cluster $\omega$ Cen, where relative
865: rotation between the old photographic plates used in their derivation
866: produce an artifact overall rotation of the cluster
867: \citep{glenn06}. If problems like these can not be removed before
868: modeling, a more complicated treatment than the one described here
869: will be necessary.
870: 
871: \subsection{Finding the maximum likelihood solution}
872: \label{sec.mkfitin}
873: 
874: The non-linear nature of the discrete problem addressed in this paper
875: requires the use of a non-linear optimizer, and there is no guarantee
876: of a unique optimum. After experimentation with various optimization
877: algorithms, we settled on the TOMS 500 conjugate gradient optimizer of
878: \citet{sha80}. This code uses the function value and gradient to
879: optimize along successive vectors (lines) in the space of the orbital
880: weights, choosing the optimization direction at every pass in a manner
881: that attempts to minimize the number of such line minimizations needed
882: (see Chapter 10 of \citealt{recipes} for details on conjugate gradient
883: methods).
884: 
885: In our code, we rely on the fact that the majority of orbits do not
886: contribute to any particular linear constraint, or to the likelihood
887: of any particular observational datum. In the notation of equation
888: (\ref{eq.1st.deriv}), the linear constraints $B_{k,l}$, and also the
889: $p_{i,l}$, are sparse matrices. Accordingly, the code to evaluate the
890: gradient in equation (\ref{eq.1st.deriv}) is written to store and
891: evaluate only non-zero terms of $B_{k,l}$ and $p_{k,l}$, reducing the
892: computational burden by a factor of four or five.
893: 
894: To evaluate convergence and estimate the proximity of our final
895: likelihood maximum to the true (possibly local) maximum, we plot the
896: magnitude of the improvement of the likelihood $\delta\lambda$ as a
897: function of the number of function evaluations $N$. See Figure
898: \ref{fig:mkfitin}. We find that $\delta\lambda$ is well represented by
899: an exponential relation $\delta\lambda \sim \exp{(-aN)}$, where
900: $a\approx 10^{-5}$. Therefore, the future change in the likelihood if
901: the optimizer were allowed to run forever would be $\Delta \ln L \sim
902: \int_{N_0}^\infty \delta\lambda\, dN = a^{-1} (\delta\lambda)_0$,
903: where $(\delta\lambda)_0$ is the current change in likelihood at step
904: $N_0$. In practice, we terminate the optimization at $\delta\lambda =
905: 10^{-6}-10^{-7}$, so that we expect to be within an additive factor of
906: $\leq 0.1$ of the true likelihood maximum. This typically occurs after
907: a number $N\sim 10^5$ of function evaluations.  The final accuracy is
908: merely linear in the exponential coefficient $a$, so that this
909: accuracy estimate should be reasonably robust.
910: 
911: We ran a variety of tests in order to establish whether the algorithm
912: has a tendency of finding local extrema as opposed to global ones. In
913: particular, for some of the test cases to be discussed later in
914: \S\,\ref{sec:tests}, we started the iterative algorithm from different
915: initial conditions, to verify that the solutions thus obtained were
916: always in (statistical) agreement. Also, as will be shown in
917: \S\,\ref{sec:tests}, we find that the algorithm recovers the
918: properties of known input models with reasonable accuracy. While this
919: does not prove that the \sch code cannot end up in a local maximum, at
920: least it shows that the code does not end up in (potential) local
921: maxima that are far from the correct solution.
922: 
923: In practice we usually start the maximization procedure from a
924: homogeneous set of initial mass weights. We also investigated whether
925: the convergence to a solution can be sped up by starting the iterative
926: process from initial conditions that may already be reasonably close
927: to the final solution. For example, we ran tests starting from a set
928: of weights corresponding to a two-integral DF of the form $f(E,L_z)$
929: that already fit the light (surface brightness) profile followed by
930: the input data. Such a solution is easily obtained as the NNLS
931: solution of a matrix equation. We found that the same final answer was
932: reached in essentially the same number of iterations.
933: 
934: 
935: %\newpage
936: 
937: 
938: \section{Pseudo-Data and Comparison Distribution Functions}
939: \label{sec:tools}
940: 
941: In order to test the performance of our discrete \sch code, we
942: generate sets of simulated data drawn from a known phase-space
943: distribution function (DF). Unlike the case of using actual
944: observations of a real stellar system, this approach offers the
945: advantage of unambiguously knowing in advance the input properties
946: underlying the data, which an optimally-working code should be able to
947: ``recover''.  It also provides flexibility by allowing the possibility
948: of adapting the input data at will in order to test different aspects
949: of the code (\S\,\ref{sec:tests}). We discuss here the construction of
950: various sets of pseudo-data and the properties of the underlying
951: models.
952: 
953: 
954: 
955: 
956: \subsection{Simulated Datasets}
957: \label{sec:data}
958: 
959: Our simulated input data are obtained from a set of $f(E,L_z)$ DFs
960: derived by \citet{vdm98}, with the methodology for drawing N-body
961: initial conditions from these DFs described in \citet{vdm97b}. The
962: models have a constant mass-to-light ratio $\Upsilon$, and have
963: neither a central black hole or extended dark halo. They provide good
964: fits to available photometric and kinematic observations of the galaxy
965: M32 over the radial range from $1-20$ arcsec. However, this property
966: has no bearing on the present analysis. We only use the fact that
967: there is a known DF, and not that this DF resembles any realistic
968: stellar system. A two-integral $f(E,L_z)$ DF provides a useful test
969: case (see also \citealt{cre99}, \citealt{ver02}), and does not mean
970: that the model results would be less valid for more general DFs. Also,
971: the use of a constant $\Upsilon$ is motivated only to simplify the
972: test environment. Central black holes (e.g., \citealt{vdm98,geb00})
973: and extended dark halos (e.g., \citealt{rix97,cap06}) can be easily
974: implemented in any \sch code.
975: 
976: The luminous mass density is assumed to be axisymmetric and is
977: parameterized according to
978: 
979: \begin{equation}
980: \label{eq:light}
981: \rho(R,z) = \rho_{0}(m/b)^{\alpha}[1+(m/b)^2]^{\beta}[1+(m/c)^2]^{\gamma},
982: \end{equation}
983: 
984: \noindent with $m^2 \equiv R^2 + (z/q)^2$. Here, $q$ is the (constant)
985: intrinsic axis ratio, related to the projected (observed) axis ratio
986: $q_p$ via the inclination angle $i$, $q_p^2 = \cos^2 i + q^2\sin^2
987: i$. The parameters in equation (\ref{eq:light}) are set to
988: $\alpha=-1.435$, $\beta=-0.423$, $\gamma=-1.298$, $b=0.\arcsec55$,
989: $c=102.\arcsec0$, $q_p=0.73$, and $\rho_0=j_{0}\Upsilon_0$, with the
990: $V$-band luminosity density $j_0 =
991: 0.463\times10^5(q_p/q)$\lsun\,pc$^{-3}$, and $\Upsilon_0$ the
992: mass-to-light ratio in the $V$-band and in solar units. The adopted
993: distance is 0.7 Mpc. The models share the property of appearing the
994: same in projection on the sky, but correspond to different intrinsic
995: axis ratios as determined by the inclination angle $i$.
996: 
997: The even part $f_e$ of the DF $f(E,L_z)$ is uniquely determined by the
998: mass density $\rho(R,z)$ (e.g., \citealt{bt87}). To specify the odd
999: part $f_o$ of the DF we follow \citet{vdm94} and write
1000: %
1001: \begin{equation}
1002: \label{eq:odd}
1003: f_o = f_e\,(2\eta-1)\,h_u[L_z/L_{z,{\rm max}}(E)],
1004: \end{equation}
1005: %
1006: with $L_{z,{\rm max}}(E)$ being the angular momentum of a circular
1007: orbit of energy $E$ in the equatorial plane ($z=0$), and the auxiliary
1008: function $h_u$ defined by
1009: %
1010: \begin{equation}
1011: \label{eq:ha}
1012: h_u(x) = \left\{
1013:  \begin{array}{ll}
1014:     \tanh(ux/2)\,\,/\,\,\tanh(u/2)&  \mbox{$(u > 0)$},\\
1015:     x&  \mbox{$(u=0)$},\\
1016:     (2/u)\tanh^{-1}[x\tanh(u/2)]&  \mbox{$(u < 0)$}.
1017:  \end{array}\right.
1018: \end{equation}
1019: %
1020: The choice of the parameters $\eta$ and $u$ determines the degree of
1021: streaming of the dataset. These free parameters can have values in the
1022: ranges $0\leq \eta \leq 1$ and $-\infty < u < \infty$, with $\eta$
1023: controlling the fraction of stars in the equatorial plane with
1024: clock-wise rotation, and $u$ controlling the behavior of the stellar
1025: streaming with orbital shape. The family of functions $h_u$ is shown
1026: in Figure 1 of \citet{vdm94}. Combinations of $(\eta,u)$ that fit data
1027: for M32 are also discussed in that paper. Here we explore a variety of
1028: input datasets with different amounts of mean streaming and test the
1029: recovery of these properties by our discrete \sch code.
1030: 
1031: We generated 6 different datasets to test our discrete \sch code. By
1032: dataset we mean a number of particle $(x',y')$ positions on the sky
1033: with corresponding proper motions $(\mu_{\rm x'},\mu_{\rm y'})$ and
1034: LOS velocities $v_{\rm z'}$. For two chosen inclinations on the sky,
1035: $i=90\grad$ and $i=55\grad$, we produced three datasets resembling
1036: systems with varying degrees of rotation: a non-streaming system
1037: ($\eta=0.5$ and $u=1$), a maximally-streaming system ($\eta=1$ and
1038: $u=\infty$), and a third system with intermediate streaming ($\eta=1$
1039: and $u=0$). We label our different datasets as 90ns, 90is, and 90ms to
1040: indicate the non-streaming, intermediate-streaming, and
1041: maximally-streaming cases of $i=90\grad$, respectively. Similarly, for
1042: the $i=55\grad$ case, we have the 55ns, 55is, and 55ms datasets. The
1043: mass-to-light ratio used to generate the datasets is $\Upsilon_0=2.51$
1044: for $i=90\grad$ and $\Upsilon_0=2.55$ for $i=55\grad$, in units of
1045: \msun/$L_{\odot,V}$.
1046: 
1047: Although we examined the performance of our \sch code with tests that
1048: involve all of the six simulated datasets introduced above, we chose
1049: to use the 55is dataset to show most of our results. Figure
1050: \ref{fig:data55is} shows some projections of the phase-space
1051: coordinates for the 55is dataset.
1052: 
1053: 
1054: 
1055: 
1056: 
1057: 
1058: \subsection{Comparison DF}
1059: \label{sec:DF}
1060: 
1061: In order to quantitatively judge the performance of the three-integral
1062: \sch code, it is desirable to make a comparison between the properties
1063: of the input DF (i.e., that from which the pseudo-data were obtained)
1064: and those of the fitted DF (i.e., that found as the solution to the
1065: fitting or minimization problem). It is important to note in this
1066: context that the direct output of our \sch code is not in the form of
1067: a proper DF $f$, but rather in the form of ``mass weights'' $\zeta$
1068: associated to each set of integrals of motion $(E,L_z,I_3)$ that
1069: uniquely define an orbit. The relation between the DF and the orbital
1070: mass weights is through a volume element dependent on the three
1071: integrals and an integration over the 3-dimensional space associated
1072: to the particular orbit (see \citealt{voort84} for a detailed
1073: discussion). Such a conversion can be done in Schwarzschild codes
1074: (e.g., \citealt{tho04}), but this is not necessary for the goals of
1075: the present paper. We therefore limit ourselves to the comparison
1076: between the input and the fitted orbital mass weight distributions,
1077: which from now on we denote by $\zeta_{\rm in}(E,L_z,I_3)$ and
1078: $\zeta_{\rm fit}(E,L_z,I_3)$, respectively.
1079: 
1080: To validate the weights $\zeta_{\rm fit}(E,L_z,I_3)$ returned by the
1081: \sch code, we need to know the weights $\zeta_{\rm in}(E,L_z,I_3)$ for
1082: the model DF $f(E,L_z)$. This is not a simple problem in the absence
1083: of an analytic expression for $I_3$. However, two related functions
1084: are more easily accessible. The first is $\bar\zeta_{\rm in}(E,L_z)$,
1085: defined as the projection of $\zeta_{\rm in}(E,L_z,I_3)$ over the
1086: $E-L_z$ plane (i.e., integrated over $I_3$). Having the means of
1087: drawing N-body initial conditions from the DF \citep{vdm97b}, we know
1088: that the energy and z-component of the angular momentum of each
1089: particle are given by $E=\psi - {1\over 2}v^2$ and $L_z = R\cdot
1090: v_{\phi}$, respectively. Therefore, $\bar\zeta_{\rm in}(E,L_z)$ is
1091: easily obtained by binning a large N-body dataset $(N\sim 10^6)$ in
1092: $E$ and $L_z$. The second related function that is easily accessible
1093: is $\zeta_{\rm Kep,\lambda}(E,L_z,I_3)$, the distribution of mass
1094: weights for an $f(E,L_z)$ model of axial ratio $q$ and a power-law
1095: density profile with logarithmic slope $\lambda$ in a spherical Kepler
1096: potential. This function is calculated analytically in de Bruijne et
1097: al. (1996; their equation (38)), and has the form
1098: %
1099: \begin{equation}
1100: \label{eq:kep}
1101: \zeta_{\rm Kep,\lambda}(E,L_z,I_3) = E^{\lambda-4}\times j_{\lambda}\left[L_z/L_{z,{\rm max}}(E),I_3\right].
1102: \end{equation}
1103: %
1104: Here, $\lambda$ is the logarithmic slope of the mass distribution and
1105: $j_{\lambda}$ is a known function. The spherical Kepler potential is
1106: of course only an accurate approximation to our model at
1107: asymptotically large radii. Nonetheless, we can combine
1108: $\bar\zeta_{\rm in}(E,L_z)$ and $\zeta_{\rm Kep,\lambda}(E,L_z,I_3)$
1109: to obtain a reasonable approximation for $\zeta_{\rm in}(E,L_z,I_3)$
1110: throughout the system, namely
1111: %
1112: \begin{equation}
1113: \label{eq:zetain}
1114: \zeta_{\rm in}(E,L_z,I_3) \approx \bar\zeta_{\rm in}(E,L_z) \times g(I_3),
1115: %j_{\lambda}\left(g(L_z,E),I_3\right).
1116: \end{equation}
1117: %
1118: with
1119: %
1120: \begin{equation}
1121: \label{eq:I3}
1122: g(I_3) \equiv \frac{j_{\lambda}\left[L_z/L_{z,{\rm max}}(E),I_3\right]}{\int j_{\lambda}\left[L_z/L_{z,{\rm max}}(E),I_3\right]{\rm d} I_3}.
1123: \end{equation}
1124: %
1125: For $\lambda$ we take the slope of the mass distribution of equation
1126: (\ref{eq:light}) at $r=R_c$, the radius of the circular orbit of
1127: energy $E$ in the equatorial plane $(z=0)$. The function $\zeta_{\rm
1128: in}$ in equation (\ref{eq:zetain}) is correct (i.e., reduces to
1129: $\bar\zeta_{\rm in}$) when projected on the $E-L_z$ plane, and has
1130: approximately the correct distribution over $I_3$ at fixed
1131: $(E,L_z)$. In this way, we construct sets of orbital mass weights for
1132: each of our 6 simulated datasets described in \S\,\ref{sec:data}.
1133: 
1134: 
1135: 
1136: 
1137: 
1138: \section{Performance Tests}
1139: \label{sec:tests}
1140: 
1141: Using all the kinematic (pseudo) datasets and their corresponding
1142: input DFs described in \S\,\ref{sec:tools} we now proceed to examine
1143: how accurately the discrete \sch code can recover the properties of
1144: the galaxy models used to generate the input datasets. By {\it
1145: recovery} we mean to determine how close or how far is the obtained
1146: solution from the known DF, known mass-to-light ratio \ml, and known
1147: inclination $i$ of the galaxy model corresponding to the simulated
1148: dataset that was provided as input to the code. At the same time, we
1149: investigate the reliability of the uncertainties provided by the code
1150: on each of these properties.
1151: 
1152: In the general case of modeling real observations of an actual stellar
1153: system, the true radial mass density profile is not known a priori and
1154: is typically described following some parameterization. Since mass may
1155: not necessarily follow light, or may do so in some complicated way,
1156: different plausible mass models should be attempted, as well as
1157: allowing for possible variations of the mass-to-light ratio with
1158: position. For the purposes of the present tests, however, the
1159: underlying mass distribution is assumed to be perfectly known from
1160: equation (\ref{eq:light}), except for the value of \ml. Therefore, the
1161: assumed parameterization for the mass distribution is only a
1162: 1-parameter family, and includes the ``correct'' distribution
1163: $(\Upsilon = \Upsilon_0)$. In applications to real data,
1164: higher-parameter families may be necessary, and there is no guarantee
1165: that any member of the family would provide a good approximation to
1166: the true underlying distribution.
1167: 
1168: The results of our tests are examined via three different exercises,
1169: which can be performed on each of the 6 different input datasets,
1170: providing a good baseline to judge the performance of our discrete
1171: \sch code. First, we explore the recovery of the internal orbital
1172: structure of the input dataset (i.e., the input DF, or more
1173: specifically, the input mass weights $\zeta_{\rm in}$) by feeding the
1174: code with the correct inclination and mass-to-light ratio \ml\,
1175: (\S\,\ref{sec:getDF}). Second, we fix the inclination to the correct
1176: value of the input dataset and study whether the code finds the
1177: minimum of the $\Delta \chi^2$ function at the correct value of $\Upsilon$
1178: (\S\,\ref{sec:getML}). And third, we explore grids of \sch models with
1179: different ($i$,\ml) combinations to study how well these two
1180: properties are recovered when they are both assumed unknown
1181: (\S\,\ref{sec:grids}).
1182: 
1183: We run all the above exercises for various subsets of each of our 6
1184: datasets in order to explore the results as a function of relevant
1185: observational variables, particularly the size of the input dataset
1186: and the type of kinematical constraints available (i.e., only LOS
1187: velocities, only proper motions, or the complete three-dimensional
1188: velocities). This adds even more elements for a thorough assessment of
1189: the code's performance. It also provides insights into the types of
1190: datasets that will be necessary to constrain $i$ or $\Upsilon$ to some
1191: given uncertainty in realistic situations.
1192: 
1193: Our \sch code has the capability of computing and storing, during a
1194: single orbit integration, the orbital properties for a series of
1195: different values of \ml. Thus, during the construction of the orbit
1196: library, different values of \ml\, are converted into a dimensionless
1197: factor $v_{\rm s}$ that multiplies all our original velocities, thus
1198: with \ml\, scaling simply as $v_{\rm s}^2$. We stress that this allows
1199: us to explore several values of \ml\, while computing only one orbit
1200: library. In our tests, we explore models for velocity factors in the
1201: range $0.8\leq v_{\rm s} \leq 1.2$. Given that our galaxy models with
1202: different inclinations have slightly different mass-to-light ratios
1203: $\Upsilon_0$, the use of this dimensionless representation also
1204: facilitates the visualization of the results in \S\,\ref{sec:getML}
1205: and \S\,\ref{sec:grids}. The correct (input) value of \ml\, is always
1206: at $v_{\rm s} = (\Upsilon/\Upsilon_0)^{1/2} = 1$.
1207: 
1208: %We start by discussing the ``standard'' parameter settings with which
1209: %we have run most of our tests (\S\,\ref{sec:settings}). 
1210: 
1211: 
1212: 
1213: 
1214: \subsection{Standard Settings}
1215: \label{sec:settings}
1216: 
1217: At each of its different steps, the \sch code requires the user to
1218: specify several settings (or dials) that control a corresponding
1219: number of tasks and functions of the modeling procedure. Here we list
1220: the settings that we use for our standard run. We concentrate on the
1221: settings that are new to the discrete implementation. All other
1222: settings that are needed to fit a \sch model (e.g., the resolution and
1223: limits of the polar grids used to compute the gravitational potential
1224: $\Psi$; the required numerical accuracies in the fitting of the mass
1225: in the meridional plane and/or the projected plane of the sky; etc.)
1226: are identical to previous implementations of the code, so for those we
1227: refer to \citet{vdm98} and \citet{cre99}.
1228: 
1229: At the heart of the \sch method lies the generation of a comprehensive
1230: library of orbits that should be representative of all types of orbits
1231: possible in the gravitational potential under study. This is achieved
1232: by adequately sampling the ranges of values that the three integrals
1233: of motion $(E,L_z,I_3)$ can acquire, each set of values uniquely
1234: determining one possible orbit. In this work we build models using two
1235: libraries that only differ in their size. Most of our runs consist of
1236: the generation of initial conditions and libraries with
1237: $20\times14\times7 = 1960$ orbits, obtained by sampling the available
1238: integral space with 20 energies $E$, 14 angular momenta $L_z$ (7
1239: positive and 7 negative), and 7 third integrals $I_3$. In order to
1240: study the dependency of the results on the size of the orbit library,
1241: we also compute \sch models using a much larger orbit library, with
1242: $40\times28\times14 = 15680$ combinations of $(E,L_z,I_3)$.
1243: 
1244: The energy $E$ is sampled via the corresponding radius $R_c$ of the
1245: circular orbit of that energy (that with maximum angular momentum) in
1246: the equatorial plane $(z=0)$. This radius is logarithmically sampled
1247: from a minimum value that we choose to be much smaller than the
1248: spatial resolution of the data, to a maximum value set much beyond the
1249: point at which most of the mass of the input distribution is actually
1250: encompassed. Since totally unconstrained by the data, therefore, the
1251: few first and last energy bins will mostly be of no interest (i.e., no
1252: mass gets assigned to them in the process of optimization). The
1253: vertical component of the angular momentum, $L_z$, is linearly sampled
1254: using the variable $\eta=L_z/L_{\rm max}$, where $\eta \in\,\,(0,1)$
1255: and $L_{\rm max}$ is the angular momentum of the circular orbit with
1256: energy $E$. While orbits with both positive and negative $L_z$ are
1257: included in the library, the latter need not be individually
1258: integrated because they are simply obtained by reversing the velocity
1259: vector at each point along the orbit. The third integral $I_3$ is
1260: sampled via an angle $w \in\,\,(0,w_{\rm th})$, where $w_{\rm th}$
1261: determines the position at which the ``thin tube'' orbit at the given
1262: $(E,L_z)$ touches its zero-velocity curve (defined by the equation $E
1263: = \Psi_{\rm eff}$, where $\Psi_{\rm eff} = \Psi -
1264: \frac{1}{2}L_z^2/R^2$ is the effective gravitational potential; see
1265: \citealt{vdm98} for a detailed presentation). Finally, in order to
1266: help alleviate the discrete nature of the numerical orbit library, some
1267: extra radial smoothing of the orbits is performed by randomly
1268: generating a small variation to the energy and computing and storing
1269: the contribution to the probabilities from the ``new'' orbit with
1270: integrals $(E+\delta E,L_z,I_3)$. It is possible to implement similar
1271: smoothing in $L_z$ and $I_3$ as well (e.g., \citealt{kra05,cap06}),
1272: but we leave this for a future version of our code. This energy
1273: smoothing is repeated, at each timestep, for 7 random $\delta E$
1274: values. Azimuthal averaging is also performed by randomly drawing 7
1275: $\phi$ values at each timestep.
1276: 
1277: Smoothing in phase-space is accomplished with the use of apertures
1278: (\S\,\ref{sec:pij}). The size of the (squared-shaped) spatial
1279: apertures are defined as a fraction of $R$, the distance to the center
1280: of the stellar system in the projected plane, and we set this fraction
1281: to 10\%. In velocity space, and for most practical applications, the
1282: measurement errors $e_{ik}$ themselves will provide sufficient
1283: ``natural'' smoothing for numerical purposes. Thus, we set the factors
1284: $\xi_k$ in equation \ref{eq.weights} to zero for all our tests (see
1285: also discussion in \S\,\ref{sec:pij}). In practice, the optimal value
1286: of $\xi_k$ will depend on the characteristics of the data
1287: (particularly the size of the velocity errors) as well as the stellar
1288: system under study. When dealing with actual data, therefore, at least
1289: a few different values should be tried in order to explore their
1290: impact on the results. Additionally, the extra smoothing provided by
1291: $\xi_k$ can also be useful to explore the validity of the quoted
1292: errors in any given application.
1293: 
1294: The uncertainties $e_{ik}$ in the LOS velocities and/or proper motions
1295: are in practice determined by the details of the observations and,
1296: since obtained by different techniques (spectroscopy versus
1297: astrometry), are of different size in general. Furthermore, the
1298: uncertainties in the velocities tangential to the plane of the sky are
1299: affected by the uncertainty in the distance to the stellar system
1300: under study. Here, however, since we deal with simulated data, we
1301: assume kinematical data of nowadays typical good quality, and simply
1302: set all these errors to a moderate and arbitrary value of $e_{ik} =
1303: 7.1$\kms.
1304: 
1305: The large majority of our tests were done on the simulated datasets as
1306: described in \S\,\ref{sec:data} {\it without} the addition of
1307: simulated observational errors (i.e., random Gaussian deviates with
1308: dispersion $e_{ik}$). This simplification was made early on in our
1309: project, based on the fact that the velocity errors should not matter
1310: much as long they are much smaller than the average one-dimensional
1311: velocity dispersion of the system under study. However, we realized
1312: later that this does induce a slight bias in our estimated
1313: mass-to-light ratios.  Our typical simulated datasets have dispersions
1314: of 48.4\kms and 46.3\kms, for the 55is and 90is cases,
1315: respectively. Therefore, by not adding any random velocity errors, the
1316: one-dimensional velocity dispersion of the pseudo-data that we
1317: actually analyzed is too small by a factor $h =
1318: (1+(e_{ik}/\sigma)^2)^{1/2}$. As a consequence of the virial theorem,
1319: it follows that we should expect to infer a mass-to-light ratio that
1320: is too small by a factor of $h^2$, corresponding to about 2.2\% and
1321: 2.4\% for the 55is and 90is datasets, respectively. Instead of
1322: rerunning all our calculations, which would have been computationally
1323: expensive, we therefore simply corrected for this bias {\it post
1324: facto}. So when studying the recovery of the mass-to-light ratio in
1325: \S\,\ref{sec:getML} and \S\,\ref{sec:grids}, instead of comparing the
1326: inferred values to the value $\Upsilon_0$ of the input model, we
1327: compare to the slightly smaller $\Upsilon_0^* = \Upsilon_0/h^2$. This
1328: quantity is $\Upsilon_0^* = 2.451$ for $i=90^{\circ}$ and
1329: $\Upsilon_0^* = 2.498$ for $i=55^{\circ}$.
1330: 
1331: The sizes of currently existing kinematic datasets of discrete nature
1332: range from a few hundred datapoints (red giants in Local Group dwarf
1333: galaxies, planetary nebulae in the outskirts of giant ellipticals) to
1334: a few thousands (stars in Galactic globular clusters, systems of
1335: globular clusters around giant ellipticals). For our standard tests we
1336: adopt datasets with 1000 kinematic observational points, although we
1337: also study the consequences of studying datasets with sizes ranging
1338: from 100 to 2000 datapoints. In these tests, the small-$N$ datasets
1339: are subsets of the largest dataset ($N = 2000$), which means that
1340: there will be some correlation between the results of experiments done
1341: as a function of the number of available observations. This approach,
1342: we note, is of no substantial difference than having all the datasets
1343: of different $N$ but within the same simulation to be completely
1344: disjoint. The progression with $N$ should still follow the expected
1345: $N^{-1/2}$ statistical-convergence behavior (see
1346: Fig.~\ref{fig:errors_ML} and \S\,\ref{sec:getML}). The generation of
1347: one of our smaller $20\times14\times7$ orbit libraries, simultaneously
1348: storing discrete probabilities for a set of 1000 observational points
1349: with both LOS velocities and proper motions, takes 2.5 hours on a 3.6
1350: GHz, Pentium 4, 64-bit CPU with 2 Gb memory. An additional 0.5 hours
1351: are needed to find the maximum likelihood fit to the data. In
1352: practice, these steps must be iterated over a grid of gravitational
1353: potential parameterizations.
1354: 
1355: 
1356: 
1357: %.... enforce meridional plane dispersion constraints?: NO ....
1358: %
1359: %.... fractional error in meridional plane mass: 0.05 ....
1360: %
1361: %.... fractional error in vel merid disp constraints: 0.20 ....
1362: %
1363: %.... settings on projected cubes used to store info ....
1364: 
1365: 
1366: 
1367: 
1368: 
1369: 
1370: \subsection{Recovery of the Distribution Function}
1371: \label{sec:getDF}
1372: 
1373: In order to determine whether the best-fitting solution obtained by
1374: the discrete \sch code actually resembles the properties of the input
1375: data, we start by making detailed comparisons between the input and
1376: fitted DFs. To do this, we feed the code with the correct inclination
1377: and mass-to-light ratio \ml\, used to generate the input datasets, and
1378: compare the fitted mass weights $\zeta_{\rm fit}$ to those
1379: corresponding to the input data, $\zeta_{\rm in}(E,L_z,I_3)$,
1380: approximated using equation (\ref{eq:zetain}). We use datasets with
1381: 1000 LOS velocities and proper motions, and present results for both
1382: the small and big orbit libraries detailed in \S\,\ref{sec:settings}.
1383: 
1384: The comparison is best achieved via the analysis of corresponding one-
1385: and two-dimensional projections of the cubes of mass weights
1386: $\zeta_{\rm fit}(E,L_z,I_3)$ and $\zeta_{\rm in}(E,L_z,I_3)$, obtained
1387: by integrating over two and one of the integrals of motion,
1388: respectively (Figs. \ref{fig:1Dplots} to \ref{fig:2Dplot55is}).  Also,
1389: we make comparisons of two-dimensional $L_z-I_3$ slices of both cubes
1390: at selected values of the energy (Fig. \ref{fig:Ebins55is}). For al
1391: of these projections we quantify the agreement between fits and input
1392: data by computing the RMS and median absolute deviation of the
1393: quantity $(\zeta_{\rm fit}-\zeta_{\rm in})/\zeta_{\rm in}$, i.e., the
1394: difference between fit and input mass weights normalized by the input
1395: mass weights. These statistics are listed for \sch models run on all
1396: our input datasets in Table 1. Since the RMS can be biased
1397: disproportionately by a small number of large outliers, in our
1398: discussion below we use preferentially the median absolute residual.
1399: 
1400: Figure \ref{fig:1Dplots} shows, for the 55is case, the integrated mass
1401: weights as a function of each of the three integrals of motion, for
1402: both the input dataset and the discrete \sch fit. Inside the region
1403: actually constrained by kinematic data (containing 99.83\% of the
1404: total mass), the mean absolute deviations between the fitted and input
1405: distributions of mass weights are 3\%, 16\%, and 18\%, for the
1406: integrated distributions as a function of $E$, $L_{\rm z}$, and $I_3$,
1407: respectively. As listed in Table 1, similar numbers are obtained for
1408: the other 5 simulated datasets, with the agreement between both
1409: distributions as a function of energy always better than 5\%. As a
1410: function of $L_z$, the largest disagreement actually corresponds to
1411: the one shown in Figure \ref{fig:1Dplots}, the 55is case. It goes down
1412: to 7\% for our case of closest agreement, the case labeled 55ns. The
1413: net rotation inherent to the 55is dataset (reflected in the middle
1414: panel by all the mass weights with positive $L_z$ being larger than
1415: those with negative $L_z$) is clearly reproduced by the \sch fit. As a
1416: function of the third integral, the median absolute deviation varies
1417: from 16\% for the 55ns case to up to 25\% for the 90ns case. Note
1418: that, since we are showing orbital mass weights instead of the actual
1419: DF, the $I_3$ distributions are not expected to be constant over
1420: $I_3$, even though the input DF underlying all simulated datasets is
1421: of the form $f(E,L_z)$.
1422: 
1423: Next, integrating only over $I_3$, we show in Figures
1424: \ref{fig:2Dplot55ns} and \ref{fig:2Dplot55is} the agreement between
1425: the fitted and input sets of mass weights as a function of $E$ and
1426: $L_z$, for the 55ns and 55is cases, respectively. The upper panels of
1427: these figures show the results of the \sch fit ($\zeta_{\rm fit}$) and
1428: the lower panels the original input distributions ($\zeta_{\rm
1429: in}$). The left-hand panels show the results for a $(E,L_z,I_3)$
1430: library of $40\times28\times14$ orbits, 8 times larger (i.e., finer)
1431: than that of the right-hand panels, which correspond to our standard
1432: case of $20\times14\times7$ orbits.  Only the energy range constrained
1433: by the respective sets of data is shown. Black corresponds to zero
1434: weight, and the white (brightest) color in each pair of panels (fit
1435: and model, or upper and lower) has been assigned to the maximum
1436: orbital weight among the two panels, so that the comparison between
1437: fits and models is made using the same color scale.
1438: %(WILL HAVE TO CHANGE THIS AND SET WHITE TO
1439: %THE MAXIMUM AMONG ALL PANELS ... STILL TO DO).
1440: 
1441: Both Figures \ref{fig:2Dplot55ns} and \ref{fig:2Dplot55is} show that
1442: the main features of the input $E-L_z$ distributions of mass weights
1443: are well reproduced by the 3-integral \sch fits. In particular, the
1444: mean streaming properties of both datasets are satisfactorily
1445: recovered.  In Figure \ref{fig:2Dplot55ns}, the two prominent
1446: phase-space blobs occupying symmetrical locations on the negative and
1447: positive sides of the $L_z$-axis correspond well with the non-rotating
1448: overall nature of the 55ns dataset. Moreover, this is recovered by
1449: both the models with standard and large orbit libraries (right- versus
1450: left-hand panels). Similarly, in Figure \ref{fig:2Dplot55is}, the
1451: single phase-space blob at positive $L_z$ with a pronounced elongation
1452: towards negative $L_z$ (in light blue and blue), indicative of the
1453: rotating nature of the 55is case, is reproduced by the \sch fit as
1454: well. The median absolute deviations between the fitted and input
1455: $E-L_z$ distributions, always restricted to the energy range
1456: constrained by the data, are 14\% and 19\% for the 55ns and 55is
1457: cases, respectively (Table 1).
1458: 
1459: In Figure \ref{fig:Ebins55is} we show the 3-dimensional distributions
1460: of mass weights of our 55is case in the form of a series of $L_z-I_3$
1461: planes at different energies. Here again, the upper panels show the
1462: results of the discrete \sch fit ($\zeta_{\rm fit}$), the lower panels
1463: the distribution of mass weights corresponding to the input data
1464: ($\zeta_{\rm in}$), and the color scale is set up in the same way as
1465: in the $E-L_z$ figures. As the energy $E$ is sampled via the radius
1466: $R_c$ of the circular orbit (its value in arcmin indicated at the top
1467: of each pair of panels), this series of planes shows the variation of
1468: the $L_z-I_3$ distribution with increasing distance from the center of
1469: the galaxy. The fraction of the total mass at each energy slice is
1470: given as a percentage at the bottom of each panel.
1471: 
1472: The bottom panels of Figure \ref{fig:Ebins55is} indicate that, in the
1473: inner regions (inside 0.2 arcmin), most of the mass in the 55is
1474: dataset is concentrated in orbits with $L_z$ near zero. The
1475: corresponding upper panels show that the \sch fit recovers this $L_z
1476: \approx 0$ component, but it distributes more weight than the input
1477: model into orbits with positive $L_z$. These inner regions,
1478: nevertheless, have a relatively low mass content in comparison with
1479: regions at larger radii. As the radius increases, the $L_z \approx 0$
1480: region of phase-space gets progressively depleted of stars in favor of
1481: orbits with high $L_z$. This transition is reasonably well reproduced
1482: by the \sch solution, and the agreement between fit and input data
1483: becomes better at large radii, at which point most of the mass at each
1484: energy is concentrated in orbits of high $L_z$.
1485: 
1486: Note also that a common characteristic of Figures
1487: \ref{fig:2Dplot55ns}, \ref{fig:2Dplot55is}, and \ref{fig:Ebins55is} is
1488: that \sch fits typically present mass weight distributions that appear
1489: broader (more extended) and less peaked than the corresponding
1490: distributions displayed by the pseudo-data. The effect is most obvious
1491: among the right-most panels of Figure \ref{fig:Ebins55is}, where one
1492: can see that the $L_z-I_3$ mass-weight distributions of the input data
1493: (lower panels) have higher peaks and overall sharper features than the
1494: corresponding fitted distributions (upper panels). This is an expected
1495: effect and is due to the combined smoothing of the fitted distribution
1496: introduced both by the (necessary) use of velocity apertures for the
1497: computation of likelihoods (see \S\,\ref{sec:pij}), and by the
1498: regularization constraints imposed in order to enforce smoothness in
1499: phase space. While the first smoothing is particular to our discrete
1500: implementation, the second is a well-known procedure common to most
1501: \sch codes. Models without regularization tend to be unrealistically
1502: noisy \citep{vdm98} and unreliable for parameter estimation
1503: \citep{cre04}. Thus, although we choose to plot the input distribution
1504: of mass weights as they actually are, the most fair of comparisons
1505: would be one in which the \sch fit is compared with a smoothed version
1506: of the original mass weight distribution describing the input data. We
1507: explored this by convolving the input distribution of mass weights
1508: with a (circular) Gaussian kernel, and then computing the same
1509: statistics shown in Table 1 (but this time using the smoothed version
1510: of the input distribution) for different widths of the Gaussian
1511: kernel. We have verified that indeed it is possible to find a kernel
1512: width for which the agreement between fit and input data is best,
1513: improving both the RMS and mean absolute deviations of Table 1 by
1514: factors between 1.2 and 1.5. Finally, we also note that the comparison
1515: in Figure \ref{fig:Ebins55is} might be affected by the accuracy of the
1516: approximation in equation (\ref{eq:zetain}), which means that the
1517: values in Table 1 are actually upper limits to the true accuracy of
1518: the \sch fits.
1519: 
1520: 
1521: 
1522: From these tests we conclude that our discrete \sch code can
1523: successfully recover the original DF inside the region constrained by
1524: the kinematic data, at least for the case in which the inclination and
1525: mass-to-light ratio are assumed known.
1526: 
1527: 
1528: 
1529: 
1530: \subsection{Recovering the mass-to-light ratio}
1531: \label{sec:getML}
1532: 
1533: For a large range of potential applications of a \sch code, such as
1534: investigating dark matter halos in galaxies, the most important
1535: property that one is interested in measuring with confidence is the
1536: mass-to-light ratio. In the present tests, this quantity is a scalar,
1537: \ml, although in more general applications it could be a function of
1538: radius. In this section we study in detail the capacity of our code to
1539: infer the correct \ml\, when the inclination of the system is assumed
1540: known. Tests were performed for a number of input datasets in order to
1541: investigate the dependence of the results on key observational
1542: variables such as the number of kinematic measurements and the type of
1543: kinematic constraints available (i.e., only-LOS velocities, only
1544: proper motions, as well as both LOS velocities and proper
1545: motions). All models in this section were computed using our small
1546: orbit library, with $20\times14\times7$ combinations of $(E,L_z,I_3)$.
1547: The results of these experiments are summarized in Figures
1548: \ref{fig:MLparabN}\,-\,\ref{fig:errors_ML}.
1549: 
1550: For the 90is and 55is cases and using full 3-dimensional velocity
1551: information, Figure \ref{fig:MLparabN} shows the $\Delta \chi^2$
1552: parabolae obtained when applying the discrete \sch code with a number
1553: of \ml\, values, distributed around the correct one ($\Upsilon_0^*$),
1554: for datasets of varying sizes. The quantity
1555: $(\Upsilon/\Upsilon_0^*)^{1/2}$ along the ordinate denotes the
1556: velocity scaling; $(\Upsilon/\Upsilon_0^*)^{1/2} = 1$ corresponding to
1557: a \sch model with the input value $\Upsilon_0^*$, defined in
1558: \S\,\ref{sec:settings}. The zero point of the vertical axis (both in
1559: Figures \ref{fig:MLparabN} and \ref{fig:MLparab2}) is arbitrary, but
1560: the difference $\Delta\chi^2$ between points on the same curve has its
1561: usual statistical meaning, and indeed we compute the (random)
1562: uncertainties on the determination of \ml\, directly from them.
1563: 
1564: Figure \ref{fig:MLparabN} shows that the difference $\Delta\chi^2$
1565: between points on the same curve becomes larger (the parabolae become
1566: narrower) as the number of available kinematic measurements increases.
1567: The determination of the best-fit \ml\, also depends on the type of
1568: available kinematic measurements. This is illustrated in Figure
1569: \ref{fig:MLparab2}, where we plot the $\Delta \chi^2$ parabolae
1570: obtained when considering only LOS velocities, only proper-motions, or
1571: the full 3-dimensional velocity information. All cases are for the
1572: 55is dataset with 1000 kinematic measurements. In this case, the
1573: $\Delta\chi^2$ parabolae become narrower as the number of available
1574: velocity components increases.
1575: 
1576: Furthermore, the statistical errors are generally smaller for larger
1577: datasets, as well as when more velocity components are available. This
1578: is shown in Figure \ref{fig:errors_ML}, where we plot the behavior of
1579: the best-fit \ml\, and its uncertainties as a function of $\log(N)$,
1580: where $N$ is the number of datapoints. The uncertainties
1581: $\Delta\Upsilon$ displayed in the upper panel of Figure
1582: \ref{fig:errors_ML} represent the $1\sigma$ intervals around the
1583: minimum of the parabolae in Figures \ref{fig:MLparabN} and
1584: \ref{fig:MLparab2}, and are defined as half the distance between the
1585: points on the curve where $\Delta \chi^2 = 1$ with respect to the
1586: minimum. The statistical errors scale roughly as $N^{-1/2}$ over an
1587: interval of 1.3 dex in $\log N$. Also, the errors in the best-fit
1588: \ml\, associated to datasets with only proper-motions (triangles) are
1589: smaller than those associated to only-LOS datasets (open circles) for
1590: any value of $N$.  In other words, our discrete \sch code satisfies
1591: the fundamental statistical expectation that it should become easier
1592: for the method to distinguish between models with different \ml\, when
1593: the amount of observational information is larger. In the case of
1594: datasets with the full 3-d velocity information, the 55is
1595: uncertainties do not quite seem to follow the $N^{-1/2}$ behavior
1596: expected from statistics. We attribute this to our tests having
1597: reached a fundamental floor due to the discrete nature of the models,
1598: a limit that can not be overcome by increasing the number $N$ of
1599: available measurements. This can cause an apparent flattening with
1600: respect to the regular $N^{-1/2}$ behavior at large $N$.
1601: 
1602: To test the robustness of the errors estimated as above, we performed
1603: the following exercise. Selecting 10 different (disjoint) realizations
1604: of the N-body data (for the 55is case with 1000 measurements of only
1605: line-of-sight velocities, the case most often found in practice), we
1606: repeated the exercise of Figures \ref{fig:MLparabN} and
1607: \ref{fig:MLparab2} and computed discrete \sch models for a set of
1608: different \ml\, values distributed around the correct input one. This
1609: was done using our small orbit library. We obtained an average
1610: best-fit \ml\, of 2.46 (less than $1\sigma$ away from the input value,
1611: $\Upsilon_0^*=2.498$), with an RMS of 0.074 (corresponding to about
1612: 3\%). When computing the statistical uncertainties using the $\Delta
1613: \chi^2$ parabolae as described above, the average $1\sigma$ error in
1614: the best-fit \ml\, of the set of experiments turns out to be 0.204,
1615: equivalent to a fractional error of 8\%. This is a factor of 2.5
1616: larger than the scatter in the results from multiple independent
1617: realizations of the pseudo-data. This gap is smaller when additional
1618: information about the individual kinematics of the tracers is
1619: available. Indeed, repeating the above exercise for the same datasets
1620: but now using two-dimensional proper-motions instead of only
1621: line-of-sight velocities, the average error in \ml\, computed from our
1622: $\Delta \chi^2$ parabolae is 0.112, a factor of 1.7 larger than the
1623: scatter of the best-fit values, which was 0.067. Therefore, we
1624: conclude that our error estimation using $\Delta\chi^2$ is
1625: conservative.
1626: 
1627: Despite the smaller statistical errors for the case with
1628: proper-motions alone, the bottom panel of Figure \ref{fig:errors_ML}
1629: indicates that the best-fit \ml\, is closer to the real value,
1630: $\Upsilon_0^*$, for the case with only-LOS velocities. While the
1631: best-fit \ml\, from datasets with only LOS velocities are well within
1632: $1\sigma$ of $\Upsilon_0^*$ for any $N$, this is not the case for the
1633: datasets with only proper-motions, with best-fit \ml\, values that are
1634: $2-4\sigma$ away from $\Upsilon_0^*$. Still, the formal best-fit \ml\,
1635: for the case of full 3-d velocities (thick squares) is on average
1636: within $2\sigma$ of the real value, $\Upsilon_0^*$, corresponding to
1637: better than $\sim 6$\% accuracy. One contribution to the small
1638: systematic bias in \ml\, may come from the fundamental nature of
1639: inverse problems in general (of which \sch modeling is an example),
1640: namely, that there may not necessarily be a unique solution: it may be
1641: possible to change the mass profile and the DF without appreciably
1642: changing the model predictions. If such is the case and there are
1643: multiple solutions, we do not necessarily expect a flat $\Delta
1644: \chi^2$ profile (i.e., with a number of equally acceptable solutions
1645: containing the correct one), most likely because of numerical noise
1646: and discretization effects. While we cannot rule this out, our results
1647: do show that this probably does not affect the recovered mass-to-light
1648: ratio at more than the $\sim 10$\% level (based on Figure
1649: \ref{fig:errors_ML}, built with models using our smaller orbit
1650: library). Unless superb data are available, random uncertainties are
1651: likely larger than such systematic errors. Currently, the only
1652: exception to this are some Galactic globular clusters, for which
1653: thousands of proper motions are being measured. However, such systems
1654: are often closer to spherical than galaxies, and hence one expects any
1655: theoretical degeneracies to be smaller. Alternatively, numerical noise
1656: in the orbit library may be the cause of this systematic bias in \ml\,
1657: seen in the bottom panel of Figure \ref{fig:errors_ML}.  Numerical
1658: noise may be reduced in part by the use of larger orbit
1659: libraries. Indeed, we show in \S\,\ref{sec:grids} below that a
1660: substantially larger orbit library tends to produce more accurate
1661: results overall.
1662: 
1663: The likelihood ratio statistic $\Delta \chi^2$ in
1664: Figures~\ref{fig:MLparabN} and \ref{fig:MLparab2} allows us to find
1665: the best-fit model parameters and their confidence intervals.
1666: However, it does not shed light on the question whether the best-fit
1667: model is actually statistically consistent with the data. The
1668: likelihood $\ln L$ of the best-fit model also cannot be used for this
1669: purpose. There is no theorem of mathematics that states what the value
1670: of $\ln L$ should be for a statistically acceptable model, given that
1671: the underlying velocity distributions from which the particles are
1672: drawn are not known a priori (and are not generally Gaussian).
1673: Nonetheless, many other statistics can be defined to address this
1674: issue once the best-fit model has been found. For example, the
1675: velocity moments of the best-fit model can be calculated (as a
1676: function of position on the sky), and statistics can be defined that
1677: assess whether these moments are consistent with the observed data.
1678: Alternatively, one can draw random realizations of the data from the
1679: best-fit model and use a Kolmogorov-Smirnov test to assess whether the
1680: data and the realization are consistent with being drawn from the same
1681: underlying distribution. We have explored a subset of these approaches
1682: and these suggested that the best-fit models are indeed statistically
1683: consistent with the pseudo-data they were designed to fit.
1684: 
1685: \subsection{Recovering the inclination and M/L}
1686: \label{sec:grids}
1687: 
1688: In general neither the mass-to-light ratio nor the inclination of a
1689: stellar system under study are known in advance, and thus one has to
1690: explore models with several combinations of both parameters in a
1691: search for those values that provide the best fit to the data. In this
1692: section we present and discuss the results of running the discrete
1693: \sch code on grids of $(i,\Upsilon)$ values to study whether the
1694: correct input combination is recovered. As in \S\,\ref{sec:getML}, we
1695: perform tests on datasets with different types of kinematic
1696: constraints (LOS velocities and/or proper motions).
1697: 
1698: The results of tests are presented in Figures \ref{fig:55is_LOS_MU}
1699: and \ref{fig:grid_55_90}. They show $\Delta\chi^2$ contours that
1700: result when computing discrete \sch models on a grid of $(i,\Upsilon)$
1701: values, including the correct input combination, for a variety of
1702: input datasets of the 55is and 90is cases. The goodness-of-fit
1703: parameter $\Delta\chi^2$ shown in these plots is obtained by first
1704: rebinning with a much finer grid the $(i,\Upsilon)$ space explored by
1705: the models actually calculated (indicated by small dots), and then
1706: computing the values on this new grid via interpolation between the
1707: nearest calculated models. We then determine the minimum on the finer
1708: grid (whose location is indicated by the star) and subtract it from
1709: all grid points to obtain the $\Delta\chi^2$ parameter, for which
1710: contours are shown. As in the case of Figures \ref{fig:MLparabN} and
1711: \ref{fig:MLparab2}, the mass-to-light ratio is parameterized by the
1712: dimensionless velocity scaling $v_{\rm
1713: s}=(\Upsilon/\Upsilon_0^*)^{1/2}$, so that the input value corresponds
1714: to $v_{\rm s}=1$.
1715: 
1716: We start by showing in Figure \ref{fig:55is_LOS_MU} the results of
1717: running grids of models for input datasets composed of only-LOS
1718: velocities and only proper motions, in both cases for the 55is case
1719: with 1000 observational datapoints, and using our small orbit library
1720: with $20\times14\times7$ combinations of $(E,L_z,I_3)$. Overall, and
1721: in agreement with the results of Figure \ref{fig:MLparab2} discussed
1722: in \S\,\ref{sec:getML}, the $\Delta\chi^2$ contours indicate that
1723: proper motions (bottom panel) better constrain the best-fit
1724: $(i,\Upsilon)$ combination than a dataset with only-LOS velocities
1725: (upper panel). The $3\sigma$ uncertainties (thick contours) obtained
1726: from the only-LOS dataset are twice as large than those from the
1727: proper motions alone (31\% and 16\%, respectively). The input
1728: mass-to-light ratio $\Upsilon_0^*$ is adequately recovered by both
1729: datasets (to within the $1\sigma$ confidence region). The best-fit
1730: inclination, however, is offset from the actual input value
1731: $i=55\grad$ for both datasets, although somewhat closer to the correct
1732: value in the case of proper motions only. The $3\sigma$ uncertainties
1733: in the best-fit inclination are $\pm 6\grad$ and $\pm 11\grad$ for the
1734: only proper motions and only LOS cases, respectively.
1735: 
1736: Difficulties in constraining the inclination using \sch modeling of
1737: stellar kinematics have been encountered in the past. A good recent
1738: example is that of \citet{kra05} who, based on integrated stellar LOS
1739: velocity profiles and ionized gas observations of the E4 galaxy NGC
1740: 2974, carried out a study analogous to the present one by constructing
1741: simulated observations of this galaxy, which they feed to their
1742: ``continuous'' (as opposed to discrete) \sch code in order to study
1743: the recovery of the input mass-to-light ratio and inclination. They
1744: find that even with artificially perfect input kinematics the
1745: inclination is very poorly constrained. The same conclusion is reached
1746: when attempting to fit the actually observed LOS velocity profiles
1747: with \sch models, so stellar LOS velocity profiles provide weak
1748: constraints on the inclination of this system, a statement they are
1749: confident about because the actual inclination of NGC 2974 is known
1750: from observations of its extended disc of neutral and ionized gas in
1751: rapid rotation.
1752: 
1753: While one could expect that the availability of proper motion
1754: measurements in addition to LOS velocities would enhance the ability
1755: of the models to obtain useful constraints on the inclination of a
1756: stellar system in general, the reality is that the current
1757: state-of-the-art of \sch modeling does not have a definitive answer on
1758: this issue yet. As recent studies of the kinematics of stars in
1759: globular clusters seem to indicate, the chances of success are highly
1760: dependable on the quality and quantity of available data on the system
1761: under study (compare, for example, the results of \citealt{glenn06}
1762: and \citealt{bos06} regarding the best-fit inclinations of $\omega$
1763: Cen and M15, respectively).
1764: 
1765: There are at least two factors that may contribute to the difficulty
1766: in recovering the inclination from stellar kinematics: degeneracies
1767: inherent to \sch models, and numerical noise. First, there is no
1768: guarantee that inclinations other than the correct one must fit the
1769: data worse. Indeed, in their modeling of high signal-to-noise
1770: integral-field data of NGC 2974, \citet{kra05} already observe that
1771: the differences between \sch models with different inclinations are
1772: smaller than the differences between the best-fitting model and the
1773: data, which they interpret as indication of a fundamental degeneracy
1774: in the recovery of the inclination with three-integral
1775: models. Numerical noise, on the other hand, is a consequence of \sch
1776: models being in the end only discrete representations of a smooth,
1777: continuous distribution of possible orbits, and it could be argued
1778: that this discreteness might have a more negative effect for high
1779: inclinations. For example, even a simple and smooth circular orbit
1780: presents cusps or discontinuities when viewed close to edge-on. The
1781: turning points of such an orbit may get smoothed out differently for
1782: different inclinations.
1783: 
1784: The issue of degeneracy, nevertheless, can be avoided in those cases
1785: where the inclination is known to be uniquely determined by the
1786: data. This is the case, e.g., in the situations where the following
1787: conditions are met: (1) the kinematical dataset consists of proper
1788: motion measurements and LOS velocities, (2) the stellar system is
1789: reasonably close to axisymmetric, and (3) there exists an independent
1790: measurement of the distance $D$ to the system. As first used in
1791: practice by \citet{glenn06}, the inclination then follows directly
1792: from the following relationship between the mean LOS velocity (in
1793: units of \kms) and the mean proper motion along the short axis (in
1794: units of mas\,yr$^{-1}$),
1795: %
1796: \begin{equation}
1797: \label{eq:inclination}
1798: \langle\,v_{z'}\,\rangle = 4.74\,\,D\,\tan i\,\,\langle\,\mu_{y'}\,\rangle, 
1799: \end{equation}
1800: %
1801: where $D$ is the distance in kpc, and the brackets denote an
1802: integration along the line-of-sight. This relation is true at each
1803: projected position $(x',y')$ in any axisymmetric system, and has been
1804: successfully applied to the Galactic globular clusters $\omega$ Cen
1805: and M15 \citep{glenn06,bos06}.
1806: 
1807: Here, in order to explore the applicability of this simple
1808: relationship, we take advantage of our a priori knowledge of the
1809: correct inclination for our simulated datasets, and study the
1810: circumstances under which the use of equation (\ref{eq:inclination})
1811: provides an accurate result. Unlike the case of integrated light
1812: measurements (where $\langle\,v_{z'}\,\rangle$ is simply the average
1813: of the LOSVD at any given projected position on the sky), in the
1814: context of discrete datasets neither $\langle\,v_{z'}\,\rangle$ nor
1815: $\langle\,\mu_{y'}\,\rangle$ are quantities that can be rigorously
1816: obtained from the data at any given $(x',y')$. Both quantities may,
1817: nevertheless, be approximated by averaging a number of kinematic
1818: measurements that fall within one or more apertures of a given size
1819: around projected positions $(x',y')$. Following this, we applied
1820: equation (\ref{eq:inclination}) to a series of subsets of our 6
1821: simulated datasets with varying number of kinematic measurements, and
1822: verified that indeed the correct inclination is reproduced provided:
1823: (a) the system is rotating (otherwise, while the relation is still
1824: valid, both averages are nearly zero and hence the inclination is not
1825: really constrained); (b) most of the datapoints are not located close
1826: to the minor axis (where rotation velocities are too small); and (c)
1827: the averages are computed from a sufficiently large number of
1828: kinematical measurements (so that the error in $\tan i$ is not too
1829: large). These are conditions that are certainly fulfilled by datasets
1830: on some Galactic globular clusters, currently the only class of
1831: stellar system for which there are 3-dimensional kinematic information
1832: available. Therefore, in those cases, equation (\ref{eq:inclination})
1833: can be safely applied. The \sch modeling can then concentrate on
1834: recovering the more interesting properties such as the orbital
1835: structure and mass-to-light ratios, which we have shown are
1836: successfully recovered when the inclination is assumed known.
1837: 
1838: To better understand the problem of numerical noise, we explored the
1839: dependence of the results on the size of the orbit library used to
1840: construct the \sch models. We did this for cases with 1000 datapoints
1841: with complete three-dimensional velocities, so that because of
1842: equation (\ref{eq:inclination}) we know that there is no theoretical
1843: degeneracy in inclination. Figure \ref{fig:grid_55_90} shows the
1844: $\Delta\chi^2$ contours resulting from fits of \sch models using our
1845: standard library of $20\times14\times7$ orbits (upper panels; same
1846: library size as in Figure~\ref{fig:55is_LOS_MU}) in comparison with
1847: fits that use a library 8 times larger, i.e., one with
1848: $40\times28\times14$ orbits (lower panels). We show results for the
1849: 55is (left-hand panels) and 90is (right-hand panels) cases.
1850: 
1851: In all four panels of Figure \ref{fig:grid_55_90}, the best-fit
1852: mass-to-light ratio is always within $1\sigma$ of the input value
1853: $\Upsilon_0^*$, with the exception of the 55is case with the bigger
1854: library (lower left), where they agree at the $2\sigma$ level. The
1855: size of the confidence regions on the mass-to-light ratio does not
1856: change significantly when the orbit library is increased in size.
1857: Therefore, we conclude that libraries of $20\times14\times7$ orbits
1858: are large enough to properly constrain the mass-to-light ratio
1859: (provided that one uses regularization as we do here; see
1860: \citealt{cre04}). This provides further justification for our use of
1861: this library size in Section~\ref{sec:getML}.
1862: 
1863: The top left panel in Figure \ref{fig:grid_55_90} is directly
1864: comparable to the two panels of Figure \ref{fig:55is_LOS_MU}, but now
1865: with three components of velocities observed, instead of just one or
1866: two, respectively. Consistent with the results in
1867: Figures~\ref{fig:errors_ML} and~\ref{fig:55is_LOS_MU}, we see that the
1868: addition of an extra component of velocity decreases the size of the
1869: confidence regions. More interestingly, a secondary minimum in $\Delta
1870: \chi^2$ appears close to the $(i,\Upsilon)$ values for the correct
1871: input model. This suggests that indeed all three components of
1872: velocity may be necessary to uniquely constrain the inclination of an
1873: axisymmetric stellar system. The bottom left panel shows the effect of
1874: increasing the orbit library size. There is now only a single minimum,
1875: centered at an inclination that agrees with the input value at the
1876: $\sim 2\sigma$ level.
1877: 
1878: The right panels in Figure \ref{fig:grid_55_90} show the situation for
1879: the 90is case. With the small library (top right), the best-fit
1880: inclination is at $i\approx 70\grad$, substantially far from the input
1881: value. When the orbit library size is increased (lower right), the
1882: best-fit shifts to $i=80\grad$. This is only $10\grad$ from the
1883: correct input value, which may well be acceptable for many realistic
1884: applications. On the other hand, the best fit and the input value are
1885: inconsistent at the many sigma level, which is certainly reason for
1886: some concern. A possible cause for this is that the turning points of
1887: orbits in edge-on systems have very sharp edges in
1888: projection. Therefore, larger grid sizes than we have used may be
1889: necessary to correctly represent them in all the necessary detail.
1890: However, we have not explored this further for two reasons. First,
1891: information on all three velocity components may be necessary to be
1892: able to uniquely constrain the inclination. If that is available, then
1893: use of equation~(\ref{eq:inclination}) will be more accurate and
1894: efficient than use of Schwarzschild modeling. Second, in practice one
1895: is generally much more interested in the mass distribution than in the
1896: inclination. Figure \ref{fig:grid_55_90} shows that the mass-to-light
1897: ratio is correctly recovered, even when the inclination is
1898: systematically biased.
1899: 
1900: In conclusion, our tests demonstrate that the recovery of the most
1901: important properties of the system (its orbital structure and the
1902: mass-to-light ratio) by our discrete \sch models is robust. Correct
1903: recovery of the inclination appears to be the most complicated aspect
1904: of the modeling. Sufficient observational data must be available and a
1905: large enough orbit library must be used. Our code can then adequately
1906: recover the inclination of sufficiently inclined systems. However, for
1907: edge-on systems there remains a systematic inclination bias of $\sim
1908: 10\grad$ that we have been unable to resolve. This is the primary
1909: shortcoming of our new approach that was unearthed by the pseudo-data
1910: tests that we have presented. This may be a generic property of
1911: Schwarzschild codes, since other authors have also reported
1912: difficulties in recovering inclinations. Either way, this is not
1913: believed to be a significant limitation for most potential practical
1914: applications of our code.
1915: 
1916: \section{Summary and conclusions}
1917: \label{sec:end}
1918: 
1919: Discrete kinematic datasets, composed of velocities of individual
1920: tracers (e.g., red giants, planetary nebulae, globular clusters,
1921: galaxies, etc.), are routinely being assembled for a variety of
1922: stellar systems of all scales (\S\,\ref{sec.intro}). These include not
1923: only LOS-velocity surveys. High-quality proper-motion databases
1924: already exist for Galactic globular clusters, and future facilities
1925: hold the promise of providing the same for stars in the nearest
1926: galaxies. However, the most sophisticated tools typically being used
1927: in the modeling of these observations were actually developed for the
1928: analysis of kinematic data in the form of LOSVDs, a rather different
1929: type of velocity information than the case of the velocities of
1930: kinematic tracers on a one-by-one basis. As a consequence, the
1931: information content of any particular dataset of a discrete nature is
1932: likely not being fully exploited. We thus have developed a specific
1933: tool for the modeling of discrete datasets, which we have presented in
1934: this paper along with detailed tests of its performance based on the
1935: modeling of simulated data.
1936: 
1937: The new tool consists of a \sch orbit-superposition code that, adapted
1938: from the implementation of \citet{vdm98}, can handle any number of
1939: (one-, two-, or three-dimensional) velocities of individual kinematic
1940: tracers without relying on any binning of the data. Under the only
1941: assumptions that the system is in steady-state equilibrium (i.e., the
1942: gravitational potential is not changing in time) and may be well
1943: approximated as axisymmetric, the code finds the distribution function
1944: (a function of the three integrals of motion $E$, $L_z$, and $I_3$)
1945: that best reproduces the observations (the velocities of the tracers
1946: as well as the overall light distribution) in a given potential. The
1947: fact that the distribution function is free to have any dependence on
1948: the three integrals of motion allows for a very general description of
1949: the orbital structure, thus avoiding common restrictive assumptions
1950: about the degree of (an)isotropy of the orbits.
1951: 
1952: Unlike previous implementations of the \sch technique, we cast the
1953: problem of finding the best superposition of orbits using a
1954: probabilistic approach, i.e., by building a likelihood function
1955: representing the probability that the entire set of measurements would
1956: have been observed assuming a particular form for the gravitational
1957: potential (\S\,\ref{sec:logL}). In this case, and in contrast with the
1958: old continuous versions, the dependence of the likelihood function on
1959: the orbital weights is non-linear, and the optimization problem can
1960: not be reduced to a linear matrix equation. Instead, it becomes a
1961: problem of the maximization of a likelihood with respect to the set of
1962: weights associated to all possible combinations of the integrals
1963: $(E,L_z,I_3)$ that comprise the orbit library (\S\,\ref{sec:logL}),
1964: and which accounts for the observed positions and (any-dimensional)
1965: velocities of all particles in the dataset, including their
1966: uncertainties (\S\,\ref{sec:pij}). After extensive testing, a
1967: conjugate gradient algorithm was found to converge satisfactorily to
1968: the correct solution and was adopted for the remaining tests of the
1969: code's overall performance (\S\,\ref{sec.mkfitin}).
1970: 
1971: In order to assess the reliability of our discrete \sch code, we
1972: applied it to several sets of simulated data, i.e., artificially
1973: generated kinematic observations obtained from a model of an
1974: axisymmetric galaxy of which the orbital structure, mass distribution,
1975: and inclination are known in advance. Pseudo-datasets were generated
1976: from a two-integral phase-space distribution function with varying
1977: degrees of overall rotation, types of velocity information (only-LOS,
1978: only proper motions, and both), total number of particles, and for two
1979: different inclinations on the plane of the sky (\S\,\ref{sec:data}).
1980: 
1981: Using the various simulated datasets, we studied the recovery of the
1982: input orbital structure or DF, mass-to-light ratio, and
1983: inclination. For the purposes of these tests, we assumed complete
1984: knowledge of the radial profile of the underlying mass distribution
1985: and a mass-to-light ratio that remains constant as a function of
1986: radius. These restrictions are easily (and must be) lifted when
1987: modeling data on real systems, in which case one needs to explore a
1988: range of plausible underlying potentials and allow for variations of
1989: the mass-to-light ratio to properly account for the possibility of
1990: central black holes and dark halos.
1991: 
1992: Inside the region constrained by data, we find that the distribution
1993: function (represented by the corresponding distributions of orbital
1994: mass weights) and streaming characteristics of the input datasets are
1995: satisfactorily recovered by the \sch fits when the correct inclination
1996: and mass-to-light ratio are known (Figs. \ref{fig:1Dplots} to
1997: \ref{fig:Ebins55is}). As measured by the mean absolute deviations
1998: between the integrated weight distributions, the agreement between the
1999: fitted and the input orbital weight distributions as a function of
2000: $E$, $L_z$, and $I_3$ is typically of the order of 3\%, 10\%, and
2001: 20\%, respectively (the numbers for our worst case being 5\%, 16\%,
2002: and 25\%). When eliminating the dependence on $I_3$, the agreement
2003: between the fitted and input $E-L_z$ distributions is of the order of
2004: 15\%, with the net rotational behavior of the input datasets cleanly
2005: recovered (Figs.  \ref{fig:2Dplot55ns} and
2006: \ref{fig:2Dplot55is}). Thus, we conclude that the discrete \sch code
2007: can successfully recover the orbital structure of the system under
2008: study.
2009: 
2010: Assuming that the inclination of the system on the plane of the sky is
2011: known, we quantified the recovery of the input mass-to-light ratio as
2012: a function of the size of the input dataset (Fig. \ref{fig:MLparabN})
2013: and of the type of kinematic information available
2014: (Fig. \ref{fig:MLparab2}). We studied both the best-fit value as well
2015: as the uncertainty in its determination
2016: (Fig. \ref{fig:errors_ML}). The statistical expectation of better
2017: results when the amount of observational information is larger (either
2018: regarding the number of datapoints or the number of velocity
2019: components) is clearly reproduced by our discrete \sch models. For the
2020: smallest datasets used in our testing ($N=100$), and regardless of
2021: whether using only-LOS velocities, only proper motions, or both, the
2022: best-fit mass-to-light ratio is within 5-10\% of the input value, with
2023: formal $1\sigma$ uncertainties of the order of 15\%. When increasing
2024: either the number of available measurements or the number of measured
2025: velocity components, the mass-to-light ratio is always recovered to
2026: better than $\sim 10\%$ accuracy, with the corresponding random
2027: ($1\sigma$) uncertainties in the range of 5-10\%. The discrete \sch
2028: code, therefore, recovers the mass-to-light ratio of the input
2029: datasets to satisfactory levels of accuracy.
2030: 
2031: The recovery of both the mass-to-light ratio and inclination when
2032: neither of these quantities are known in advance (as is usually the
2033: case with real observations) was studied using a grid of discrete \sch
2034: models, exploring also the dependence on the type of velocity
2035: components available (Fig. \ref{fig:55is_LOS_MU}). We find that the
2036: mass-to-light ratio was again successfully recovered, but the best-fit
2037: inclination was not identified correctly using small orbit
2038: libraries. We found that this was remedied by better sampling the
2039: available $(E,L_z,I_3)$ integral space using a larger orbit library
2040: (Fig. \ref{fig:grid_55_90}). For our input datasets with $i=55\grad$,
2041: the best-fit inclination obtained by our models with a large orbit
2042: library is $57\grad$, while for input datasets with $i=90\grad$ we
2043: obtain a best-fit model with $i=80\grad$. Given the known difficulty
2044: of \sch models in general for determining the inclination of stellar
2045: systems, and considering the low relative importance of this parameter
2046: compared to other properties such as the orbital structure and the
2047: mass-to-light ratio, we regard this small disagreement for the high
2048: inclination datasets as acceptable.
2049: 
2050: In summary, we have shown that our new \sch code, designed to
2051: adequately handle modern datasets composed of discrete measurements of
2052: kinematic tracers, doing this without any loss of information due to
2053: data binning or restrictive assumptions on the distribution function,
2054: is able to constrain satisfactorily the orbital structure,
2055: mass-to-light ratio, and inclination of the system under
2056: study. Applications to data for Galactic globular clusters and nearby
2057: dE galaxies will be presented in future papers. These are only two
2058: examples of a large range of dynamical problems in astronomy to which
2059: a discrete \sch code like ours can be applied, so we expect this new
2060: tool will contribute to the better understanding of stellar systems in
2061: general.
2062: 
2063: 
2064: 
2065: 
2066: %\begin{acknowledgements}
2067: %Thanks to  ....
2068: %\end{acknowledgements}
2069: %
2070: %\section{Acknowledgements}
2071: %\label{sec:gracias}
2072: 
2073: \acknowledgements
2074: 
2075: We are happy to thank Marla Geha and Raja Guhathakurta for their
2076: continued interest in the present work and its extension to the study
2077: of actual galaxies using their unique data on dwarf ellipticals. We
2078: also thank Glenn van de Ven for very useful discussions, his interest
2079: in the progress of this project and, last but not least, for his
2080: invaluable help with IDL routines. This paper also benefited by
2081: comments from Davor Krajnovic, Aaron Romanowsky, and David
2082: Merritt. Thanks also to George Meylan for his help with the writing of
2083: the HST Theory proposal specified below, and to the anonymous referee,
2084: whose comments and suggestions improved the presentation of the
2085: paper. This work was carried out as part of HST Theory Project \#9952
2086: and was supported by NASA through a grant from STScI, which is
2087: operated by AURA, Inc., under NASA contract NAS 5-26555.
2088: 
2089: 
2090: 
2091: 
2092: 
2093: %% Appendix material should be preceded with a single \appendix command.
2094: %% There should be a \section command for each appendix. Mark appendix
2095: %% subsections with the same markup you use in the main body of the paper.
2096: 
2097: %% Each Appendix (indicated with \section) will be lettered A, B, C, etc.
2098: %% The equation counter will reset when it encounters the \appendix
2099: %% command and will number appendix equations (A1), (A2), etc.
2100: 
2101: %\appendix
2102: 
2103: %\section{Appendix material}
2104: 
2105: \begin{thebibliography}{}
2106: 
2107: \bibitem[Batsleer \& Dejonghe(1993)]{bat93} Batsleer, P., \& Dejonghe, H.\ 1993, \aap, 271, 104
2108: \bibitem[Batsleer \& Dejonghe(1995)]{bat95} Batsleer, P., \& Dejonghe, H.\ 1995, \aap, 294, 693
2109: \bibitem[Bender et al.(2005)]{ben05} Bender, R., et al.\ 2005, \apj, 631, 280
2110: \bibitem[Binney \& Mamon(1982)]{bin82} Binney, J., \& Mamon, G.~A.\ 1982, \mnras, 200, 361
2111: \bibitem[Binney \& Tremaine(1987)]{bt87} Binney, J., \& Tremaine, S.\ 1987, Princeton, NJ, Princeton University Press, 1987
2112: \bibitem[Cappellari et al.(2002)]{cap02} Cappellari, M., Verolme, E.~K., van der Marel, R.~P., Kleijn, G.~A.~V., Illingworth, G.~D., Franx, M., Carollo, C.~M., \& de Zeeuw, P.~T.\ 2002, \apj, 578, 787
2113: \bibitem[Cappellari et al.(2006)]{cap06} Cappellari, M., et al.\ 2006, \mnras, 366, 1126
2114: \bibitem[C{\^o}t{\'e} et al.(2001)]{cote01} C{\^o}t{\'e}, P., et al.\ 2001, \apj, 559, 828
2115: \bibitem[C{\^o}t{\'e} et al.(2003)]{cote03} C{\^o}t{\'e}, P., McLaughlin, D.~E., Cohen, J.~G., \& Blakeslee, J.~P.\ 2003, \apj, 591, 850
2116: \bibitem[Cretton et al.(1999)]{cre99} Cretton, N., de Zeeuw, P.~T., van der Marel, R.~P., \& Rix, H.-W.\ 1999, \apjs, 124, 383
2117: \bibitem[Cretton et al.(2000)]{cre00} Cretton, N., Rix, H.-W., \& de Zeeuw, P.~T.\ 2000, \apj, 536, 319
2118: \bibitem[Cretton \& Emsellem(2004)]{cre04} Cretton, N., \& Emsellem, E.\ 2004, \mnras, 347, L31
2119: \bibitem[Davies et al.(2006)]{dav06} Davies, R.~I., et al.\ 2006, \apj, 646, 754
2120: \bibitem[Dehnen \& Gerhard(1994)]{deh94} Dehnen, W., \& Gerhard, O.~E.\ 1994, \mnras, 268, 1019
2121: \bibitem[Dekel et al.(2005)]{dek05} Dekel, A., Stoehr, F., Mamon, G.~A., Cox, T.~J., Novak, G.~S., \& Primack, J.~R.\ 2005, \nat, 437, 707
2122: \bibitem[de Bruijne et al.(1996)]{deb96} de Bruijne, J.~H.~J., van der Marel, R.~P., \& de Zeeuw, P.~T.\ 1996, \mnras, 282, 909
2123: \bibitem[de Lorenzi et al.(2007)]{nmagic} de Lorenzi, F., Debattista, V.~P., Gerhard, O., \& Sambhus, N.\ 2007, \mnras, 376, 71
2124: \bibitem[Douglas et al.(2002)]{dou02} Douglas, N.~G., et al.\ 2002, \pasp, 114, 1234
2125: \bibitem[Douglas et al.(2007)]{dou07} Douglas, N.~G., et al.\ 2007, arXiv:astro-ph/0703047
2126: \bibitem[Gebhardt et al.(1997)]{geb97} Gebhardt, K., Pryor, C., Williams, T.~B., Hesser, J.~E., \& Stetson, P.~B.\ 1997, \aj, 113, 1026
2127: \bibitem[Gebhardt et al.(2000)]{geb00} Gebhardt, K., et al.\ 2000, \aj, 119, 1157
2128: \bibitem[Gebhardt et al.(2003)]{geb03} Gebhardt, K., et al.\ 2003, \apj, 583, 92
2129: \bibitem[Geha et al.(2006)]{geh06} Geha, M., Guhathakurta, P., Rich, R.~M., \& Cooper, M.~C.\ 2006, \aj, 131, 332
2130: \bibitem[Gerhard(1993)]{ger93} Gerhard, O.~E.\ 1993, \mnras, 265, 213
2131: \bibitem[Gerhard et al.(1998)]{ger98} Gerhard, O., Jeske, G., Saglia, R.~P., \& Bender, R.\ 1998, \mnras, 295, 197
2132: \bibitem[Gerssen et al.(2002)]{ger02} Gerssen, J., van der Marel, R.~P., Gebhardt, K., Guhathakurta, P., Peterson, R.~C., \& Pryor, C.\ 2002, \aj, 124, 3270
2133: \bibitem[Gilbert et al.(2006)]{gil06} Gilbert, K.~M., et al.\ 2006, \apj, 652, 1188
2134: \bibitem[Gilbert et al.(2007)]{gil07} Gilbert, K.~M., et al.\ 2007, \apj, 668, 245
2135: \bibitem[Kleyna et al.(2001)]{jan01} Kleyna, J.~T., Wilkinson, M.~I., Evans, N.~W., \& Gilmore, G.\ 2001, \apjl, 563, L115
2136: \bibitem[Kleyna et al.(2002)]{jan02} Kleyna, J., Wilkinson, M.~I., Evans, N.~W., Gilmore, G., \& Frayn, C.\ 2002, \mnras, 330, 792
2137: \bibitem[Krajnovi{\'c} et al.(2005)]{kra05} Krajnovi{\'c}, D., Cappellari, M., Emsellem, E., McDermid, R.M., \& de Zeeuw, P.T.\ 2005, \mnras, 357, 1113
2138: \bibitem[Kunkel et al.(1997)]{bill97} Kunkel, W.~E., Demers, S., Irwin, M.~J., \& Albert, L.\ 1997, \apjl, 488, L129
2139: \bibitem[Kunkel et al.(2000)]{bill00} Kunkel, W.~E., Demers, S., \& Irwin, M.~J.\ 2000, \aj, 119, 2789
2140: \bibitem[{\L}okas(2002)]{lok02} {\L}okas, E.~L.\ 2002, \mnras, 333, 697
2141: \bibitem[{\L}okas \& Mamon(2003)]{lok03} {\L}okas, E.~L., \& Mamon, G.~A.\ 2003, \mnras, 343, 401
2142: \bibitem[{\L}okas et al.(2005)]{lok05} {\L}okas, E.~L., Mamon, G.~A., \& Prada, F.\ 2005, \mnras, 363, 918
2143: \bibitem[Magorrian(2006)]{mag06} Magorrian, J.\ 2006, \mnras, 373, 425
2144: \bibitem[Magorrian et al.(1998)]{mag98} Magorrian, J., et al.\ 1998, \aj, 115, 2285
2145: \bibitem[Mayor et al.(1997)]{may97} Mayor, M., et al.\ 1997, \aj, 114, 1087
2146: \bibitem[McLaughlin et al.(2006)]{mcl06} McLaughlin, D.~E., Anderson, J., Meylan, G., Gebhardt, K., Pryor, C., Minniti, D., \& Phinney, S.\ 2006, \apjs, 166, 249
2147: \bibitem[McNamara et al.(2003)]{mcn03} McNamara, B.~J., Harrison, T.~E., \& Anderson, J.\ 2003, \apj, 595, 187
2148: \bibitem[Merritt(1993)]{mer93} Merritt, D.\ 1993, \apj, 413, 79
2149: \bibitem[Merritt \& Saha(1993)]{mer93a} Merritt, D., \& Saha, P.\ 1993, \apj, 409, 75
2150: \bibitem[Merritt et al.(1997)]{mer97} Merritt, D., Meylan, G., \& Mayor, M.\ 1997, \aj, 114, 1074
2151: \bibitem[Merritt(1999)]{mer99} Merritt, D.\ 1999, \pasp, 111, 129
2152: \bibitem[Press et al.(1992)]{recipes} Press, W.~H., Teukolsky, S.~A., Vetterling, W.~T., \& Flannery, B.~P.\ 1992, Cambridge University Press, 1992, 2nd ed.
2153: \bibitem[Reijns et al.(2006)]{rei06} Reijns, R.~A., Seitzer, P., Arnold, R., Freeman, K.~C., Ingerson, T., van den Bosch, R.~C.~E., van de Ven, G., \& de Zeeuw, P.~T.\ 2006, \aap, 445, 503
2154: \bibitem[Richstone \& Tremaine(1984)]{ric84} Richstone, D.~O., \& Tremaine, S.\ 1984, \apj, 286, 27
2155: \bibitem[Richtler et al.(2004)]{tom04} Richtler, T., et al.\ 2004, \aj, 127, 2094
2156: \bibitem[Rix et al.(1997)]{rix97} Rix, H.-W., de Zeeuw, P.~T., Cretton, N., van der Marel, R.~P., \& Carollo, C.~M.\ 1997, \apj, 488, 702
2157: \bibitem[Romanowsky \& Kochanek(2001)]{rom01} Romanowsky, A.~J., \& Kochanek, C.~S.\ 2001, \apj, 553, 722
2158: \bibitem[Romanowsky et al.(2003)]{rom03} Romanowsky, A.~J., Douglas, N.~G., Arnaboldi, M., Kuijken, K., Merrifield, M.~R., Napolitano, N.~R., Capaccioli, M., \& Freeman, K.~C.\ 2003, Science, 301, 1696
2159: \bibitem[Schwarzschild(1979)]{sch79} Schwarzschild, M.\ 1979, \apj, 232, 236
2160: \bibitem[Shanno \& Phua(1980)]{sha80} Shanno, D.F., \& Phua, K.H.\ 1980, ACM Transactions on Mathematical Software, 6, 618
2161: \bibitem[Simon \& Geha(2007)]{sim07} Simon, J.~D., \& Geha, M.\ 2007, \apj, 670, 313
2162: \bibitem[Suntzeff \& Kraft(1996)]{sun96} Suntzeff, N.~B., \& Kraft, R.~P.\ 1996, \aj, 111, 1913
2163: \bibitem[Syer \& Tremaine(1996)]{m2m} Syer, D., \& Tremaine, S.\ 1996, \mnras, 282, 223
2164: \bibitem[Teodorescu et al.(2005)]{teo05} Teodorescu, A.~M., M{\'e}ndez, R.~H., Saglia, R.~P., Riffeser, A., Kudritzki, R.-P., Gerhard, O.~E., \& Kleyna, J.\ 2005, \apj, 635, 290
2165: \bibitem[Thomas et al.(2004)]{tho04} Thomas, J., Saglia, R.~P., Bender, R., Thomas, D., Gebhardt, K., Magorrian, J., \& Richstone, D.\ 2004, \mnras, 353, 391
2166: \bibitem[Valluri et al.(2004)]{val04} Valluri, M., Merritt, D., \& Emsellem, E.\ 2004, \apj, 602, 66
2167: \bibitem[van de Ven et al.(2006)]{glenn06} van de Ven, G., van den Bosch, R.C.E., Verolme, E.K., \& de Zeeuw, P.T.\ 2006, \aap, 445, 513
2168: \bibitem[van den Bosch et al.(2006)]{bos06} van den Bosch, R., de Zeeuw, T., Gebhardt, K., Noyola, E., \& van de Ven, G.\ 2006, \apj, 641, 852
2169: \bibitem[van der Marel \& Franx(1993)]{vdm93} van der Marel, R.~P., \& Franx, M.\ 1993, \apj, 407, 525
2170: \bibitem[van der Marel \etal(1994)]{vdm94} van der Marel, R.P., Evans, N.W., Rix, H.-W., White, S.D.M., \& de Zeeuw, P.T.\ 1994, \mnras, 271, 99
2171: \bibitem[van der Marel et al.(1997a)]{vdm97a} van der Marel, R.~P., de Zeeuw, P.~T., \& Rix, H.-W.\ 1997, \apj, 488, 119
2172: \bibitem[van der Marel \etal(1997b)]{vdm97b} van der Marel, R.P., Sigurdsson, S., \& Hernquist, L.\ 1997, \apj, 487, 153
2173: \bibitem[van der Marel \etal(1998)]{vdm98} van der Marel, R.P., Cretton, N., de Zeeuw, P.T., \& Rix, H.-W.\ 1998, \apj, 493, 613
2174: \bibitem[van der Marel et al.(2000)]{vdm00} van der Marel, R.~P., Magorrian, J., Carlberg, R.~G., Yee, H.~K.~C., \& Ellingson, E.\ 2000, \aj, 119, 2038
2175: \bibitem[van der Marel et al.(2002)]{vdm02} van der Marel, R.~P., Alves, D.~R., Hardy, E., \& Suntzeff, N.~B.\ 2002, \aj, 124, 2639
2176: \bibitem[van der Marel \& van Dokkum(2006)]{vdm06} van der Marel, R.~P., \& van Dokkum, P.~G.\ 2006, astro-ph/0611571
2177: \bibitem[Vandervoort(1984)]{voort84} Vandervoort, P.O.\ 1984, \apj, 287, 475
2178: \bibitem[van Leeuwen et al.(2000)]{vleu00} van Leeuwen, F., Le Poole, R.~S., Reijns, R.~A., Freeman, K.~C., \& de Zeeuw, P.~T.\ 2000, \aap, 360, 472
2179: \bibitem[Verolme \& de Zeeuw(2002)]{ver02} Verolme, E.~K., \& de Zeeuw, P.~T.\ 2002, \mnras, 331, 959
2180: \bibitem[Walker et al.(2006)]{wal06} Walker, M.~G., Mateo, M., Olszewski, E.~W., Bernstein, R., Wang, X., \& Woodroofe, M.\ 2006, \aj, 131, 2114
2181: \bibitem[Wilkinson et~al.(2002)]{wil02} Wilkinson M.I., Kleyna, J.T., Evans, N.W., Gilmore G., 2001, MNRAS, 330, 778
2182: \bibitem[Wilkinson et al.(2004)]{wil04} Wilkinson, M.~I., Kleyna, J.~T., Evans, N.~W., Gilmore, G.~F., Irwin, M.~J., \& Grebel, E.~K.\ 2004, \apjl, 611, L21
2183: \bibitem[Wojtak \& {\L}okas(2007)]{woj07a} Wojtak, R., \& {\L}okas, E.~L.\ 2007, \mnras, 377, 843
2184: \bibitem[Wojtak et al.(2007)]{woj07b} Wojtak, R., {\L}okas, E.~L., Mamon, G.~A., Gottl{\"o}ber, S., Prada, F., \& Moles, M.\ 2007, \aap, 466, 437
2185: \bibitem[Wu(2007)]{wu07} Wu, X.\ 2007, astro-ph/0702233
2186: \bibitem[Wu \& Tremaine(2006)]{wu06} Wu, X., \& Tremaine, S.\ 2006, \apj, 643, 210
2187: 
2188: \end{thebibliography}
2189: 
2190: 
2191: \clearpage
2192: 
2193: %\begin{landscape}
2194: %\rotate
2195: \begin{deluxetable}{ccrlcrlcrlcrlcrl}
2196: \tablewidth{0pc}
2197: %\tabletypesize{\tiny}
2198: \tabletypesize{\scriptsize}
2199: \tablecaption{Comparison between input and fitted orbital mass weights}
2200: \tablehead{
2201: \multicolumn{1}{c}{dataset} &
2202: \multicolumn{1}{c}{} &
2203: \multicolumn{2}{c}{no projection} &
2204: \multicolumn{1}{c}{} &
2205: \multicolumn{2}{c}{$I_3$} &
2206: \multicolumn{1}{c}{} &
2207: \multicolumn{2}{c}{$L_z,I_3$} &
2208: \multicolumn{1}{c}{} &
2209: \multicolumn{2}{c}{$E,I_3$} &
2210: \multicolumn{1}{c}{} &
2211: \multicolumn{2}{c}{$E,L_z$} \\
2212: \multicolumn{1}{c}{} &
2213: \multicolumn{1}{c}{} &
2214: \multicolumn{1}{c}{RMS} &
2215: \multicolumn{1}{c}{$\mid{\rm med}\mid$} &
2216: \multicolumn{1}{c}{} &
2217: \multicolumn{1}{c}{RMS} &
2218: \multicolumn{1}{c}{$\mid{\rm med}\mid$} &
2219: \multicolumn{1}{c}{} &
2220: \multicolumn{1}{c}{RMS} &
2221: \multicolumn{1}{c}{$\mid{\rm med}\mid$} &
2222: \multicolumn{1}{c}{} &
2223: \multicolumn{1}{c}{RMS} &
2224: \multicolumn{1}{c}{$\mid{\rm med}\mid$} &
2225: \multicolumn{1}{c}{} &
2226: \multicolumn{1}{c}{RMS} &
2227: \multicolumn{1}{c}{$\mid{\rm med}\mid$}
2228: }
2229: \startdata
2230: 55ns & & 4.5138 & 0.4531 & & 0.2207 & 0.1404 & & 0.0908 & 0.0359 & & 0.1080 & 0.0719 & & 0.2634 & 0.1591 \\
2231: 55is & & 6.6525 & 0.5630 & & 0.3677 & 0.1915 & & 0.0939 & 0.0309 & & 0.1912 & 0.1598 & & 0.2460 & 0.1855 \\
2232: 55ms & & 3.1524 & 0.4208 & & 0.2123 & 0.1207 & & 0.0884 & 0.0334 & & 0.1845 & 0.0903 & & 0.2505 & 0.1122 \\
2233: 90ns & & 3.7560 & 0.5939 & & 0.2683 & 0.1633 & & 0.1419 & 0.0425 & & 0.2202 & 0.1365 & & 0.2361 & 0.2491 \\
2234: 90is & & 2.7956 & 0.6410 & & 0.6253 & 0.1629 & & 0.1366 & 0.0347 & & 0.4826 & 0.1178 & & 0.2566 & 0.2364 \\
2235: 90ms & & 1.4893 & 0.4927 & & 0.2298 & 0.1714 & & 0.1216 & 0.0384 & & 0.1456 & 0.1149 & & 0.2953 & 0.2333 \\
2236: %\\
2237: %55ns & & 8.7257 & 0.4920 & & 0.3179 & 0.1459 & & 0.0304 & 0.0173 & & 0.1512 & 0.1022 & & 0.5774 & 0.1500 \\
2238: %55is & & 9.3958 & 0.6237 & & 0.5400 & 0.2400 & & 0.0316 & 0.0171 & & 0.2450 & 0.2281 & & 0.7361 & 0.1604 \\
2239: %55ms & & 4.5211 & 0.4539 & & 0.2914 & 0.1731 & & 0.0305 & 0.0181 & & 0.2278 & 0.1553 & & 0.6633 & 0.1909 \\
2240: %90ns & & 4.1376 & 0.5866 & & 0.4036 & 0.1687 & & 0.0481 & 0.0375 & & 0.2173 & 0.1300 & & 0.7313 & 0.2064 \\
2241: %90is & & 5.8993 & 0.7236 & & 2.5684 & 0.2071 & & 0.0452 & 0.0324 & & 0.7889 & 0.1608 & & 0.7603 & 0.1658 \\
2242: %90ms & & 1.7923 & 0.5176 & & 0.3792 & 0.1960 & & 0.0290 & 0.0251 & & 0.1682 & 0.0896 & & 0.8509 & 0.1511 \\
2243: \enddata
2244: 
2245: \tablecomments{The tabulated numbers are the root mean square and
2246: median absolute deviation of the quantity $(\zeta_{\rm fit}-\zeta_{\rm
2247: in})/\zeta_{\rm in}$, i.e., the difference between fit and input mass
2248: weights normalized by the input mass weights, for \sch models based on
2249: our small ($20\times14\times7$) orbit library. The statistics are
2250: always computed inside the energy range constrained by the data (see
2251: Fig.\ref{fig:1Dplots}), and are shown for the full cubes of mass
2252: weights (columns labeled ``no projection'') and for various
2253: projections of these cubes. The projected distributions are obtained
2254: by integrating over one or two of the integrals of motion (i.e., by
2255: collapsing the 3-D cubes in one or two dimensions), and appear under
2256: the columns labeled by the integral(s) of motion over which the
2257: integration has been done. }
2258: 
2259: \end{deluxetable}
2260: %\end{landscape}
2261: 
2262: 
2263: \clearpage
2264: 
2265: %% Use the figure environment and \plotone or \plottwo to include
2266: %% figures and captions in your electronic submission.
2267: %% To embed the sample graphics in
2268: %% the file, uncomment the \plotone, \plottwo, and
2269: %% \includegraphics commands
2270: %%
2271: %% If you need a layout that cannot be achieved with \plotone or
2272: %% \plottwo, you can invoke the graphicx package directly with the
2273: %% \includegraphics command or use \plotfiddle. For more information,
2274: %% please see the tutorial on "Using Electronic Art with AASTeX" in the
2275: %% documentation section at the AASTeX Web site,
2276: %% http://www.journals.uchicago.edu/AAS/AASTeX.
2277: %%
2278: %% The examples below also include sample markup for submission of
2279: %% supplemental electronic materials. As always, be sure to check
2280: %% the instructions to authors for the journal you are submitting to
2281: %% for specific submissions guidelines as they vary from
2282: %% journal to journal.
2283: 
2284: %% This example uses \plotone to include an EPS file scaled to
2285: %% 80% of its natural size with \epsscale. Its caption
2286: %% has been written to indicate that additional figure parts will be
2287: %% available in the electronic journal.
2288: 
2289: %%\begin{figure}
2290: %%\vspace*{-1cm}
2291: %%%{\hspace{-15cm}
2292: %%%\plotfiddle{positions_55is_1E5.ps}{15cm}{0}{50}{50}{-25}{0}}
2293: %%\vspace*{-1cm}
2294: %%\caption{\label{fig:data55is} Phase-space projections of the 55is
2295: %%dataset. }\end{figure}
2296: 
2297: \clearpage
2298: 
2299: \begin{figure}
2300: %\epsscale{.80}
2301: \plotone{f1.eps}
2302: \caption{\label{fig:mkfitin} Maximization of the total likelihood $\ln
2303: L$ as a function of the number of function evaluations $N$, for a
2304: typical \sch fit using a dataset consisting of 1000 discrete kinematic
2305: measurements consisting of full 3-dimensional velocities (55is case;
2306: \S\,\ref{sec:data} and Figure \ref{fig:data55is}). Shown on the
2307: vertical axis is the change in the quantity $\lambda = -2\ln L$,
2308: denoted as $\delta\lambda$. This change becomes smaller as the
2309: optimization converges to a solution following approximately the
2310: exponential relation illustrated by the dotted line. See discussion in
2311: \S\,\ref{sec.mkfitin}. }
2312: \end{figure}
2313: 
2314: \clearpage
2315: 
2316: \begin{figure}
2317: %\epsscale{.80}
2318: %\plotone{3_maps.from_hoegaarden.10e4.ps}
2319: \plotone{f2.eps}
2320: \caption{\label{fig:data55is} Three phase-space projections using
2321: 10,000 particles of the 55is simulated dataset ($i=55\grad$ with
2322: intermediate streaming). All input datasets have been constructed by
2323: randomly drawing discrete particles from a two-integral distribution
2324: function of the form $f(E,L_z)$, and are built so that, regardless of
2325: their true inclination, they have the same light distribution when
2326: projected on the plane of the sky. The coordinate system $(x',y',z')$
2327: represents the observer's system, with $(x',y')$ on the plane of the
2328: sky, and $z'$ the direction along the line-of-sight, defined positive
2329: away from the observer. The coordinates $r$, $v_{\theta}$, and
2330: $v_{\phi}$ correspond to the usual spherical coordinates intrinsic to
2331: the system. Spatial coordinates are in units of 8.7 arcsec, and
2332: velocities in units of 250 \kms. Note the asymmetry with respect to
2333: $v_{\phi}=0$ in the bottom-right panel, reflecting the net rotation of
2334: the 55is dataset. }
2335: \end{figure}
2336: 
2337: \clearpage
2338: 
2339: \begin{figure}
2340: \vspace*{-1cm}
2341: %\plotone{ELzI3_55is.ps}
2342: \plotone{f3.eps}
2343: %{\hspace{-15cm}
2344: %\plotfiddle{ELzI3_55is.ps}{1cm}{0}{400}{500}{-25}{0}}
2345: %\vspace*{-3cm}
2346: \caption{\label{fig:1Dplots} Integrated mass weights as a function of
2347: the three integrals of motion, $E$, $L_z$, and $I_3$, for the 55is
2348: dataset ($i=55^{\circ}$ with intermediate streaming) with 1000
2349: kinematic constraints and full 3-dimensional velocities (both LOS
2350: velocities and proper motions). The \sch fit (solid lines) was
2351: obtained using a library with $40\times28\times14$ orbits (our
2352: ``large'' library), and satisfactorily reproduces the mass
2353: distributions associated with the input dataset (dashed lines). The
2354: vertical dotted lines in the upper panel indicate the energy range
2355: constrained by the kinematic data.  The middle panel, with the mass
2356: distribution at positive $L_z$ always higher than that at negative
2357: $L_z$, reflects the net rotation of the 55is dataset. Note that, since
2358: we are showing orbital mass weights instead of the actual distribution
2359: function, the $I_3$ distributions in the bottom panel are not constant
2360: over $I_3$, even though the input distribution function is of the form
2361: $f(E,L_z)$.}
2362: \end{figure}
2363: 
2364: \clearpage
2365: 
2366: \begin{figure}
2367: \vspace*{-1cm}
2368: %\plotone{FIGURE_DF.55ns.model_vs_fit.En_Lz_2D.40x14x14_vs_20x7x7.ps}
2369: \plotone{f4.eps}
2370: %{\hspace{-15cm}
2371: %\plotfiddle{FIGURE_DF.55ns.model_vs_fit.En_Lz_2D.1st_version.40x14x14.ps}{1cm}{0}{400}{500}{-25}{0}}
2372: %\vspace*{-3cm}
2373: \caption{\label{fig:2Dplot55ns} Comparison of the input and fitted
2374: distributions of mass weights as a function of energy and $L_z$ for
2375: the 55ns dataset ($i=55^{\circ}$ with no streaming) with 1000 LOS
2376: velocities and proper motions. Only the energy range containing most
2377: of the total mass is shown. Upper panels show the weight distribution
2378: obtained by the \sch fit when the inclination and mass-to-light ratio
2379: are assumed to be known a priori, and the bottom panels show the
2380: weights distribution associated to the simulated input
2381: data. Right-hand panels show the results of the \sch code for an orbit
2382: library with $20\times14\times7$ orbits, while the left-hand panels
2383: show the results for a library 8 times bigger, with
2384: $40\times28\times14$ combinations of $(E,L_z,I_3)$. Black corresponds
2385: to zero weight, and the white (brightest) color in each pair of panels
2386: (fit and model, or upper and lower) has been assigned to the maximum
2387: orbital weight among the two panels, so that the comparison between
2388: fits and models is made using the same color scale. The images in this
2389: and subsequent figures are based on two-dimensional spline curves
2390: fitted to the gridded information. While the two bottom panels
2391: represent the same input data, their visualizations differ due to a
2392: different coarseness in the gridding of integral space.}
2393: \end{figure}
2394: 
2395: \clearpage
2396: 
2397: \begin{figure}
2398: \vspace*{-1cm}
2399: %\plotone{FIGURE_DF.55is.model_vs_fit.En_Lz_2D.40x14x14_vs_20x7x7.ps}
2400: \plotone{f5.eps}
2401: %{\hspace{-15cm}
2402: %\plotfiddle{FIGURE_DF.55ns.model_vs_fit.En_Lz_2D.1st_version.40x14x14.ps}{1cm}{0}{400}{500}{-25}{0}}
2403: %\vspace*{-3cm}
2404: \caption{\label{fig:2Dplot55is} Same as in Figure \ref{fig:2Dplot55ns} but for
2405: the 55is dataset ($i=55^{\circ}$ with intermediate streaming).
2406: }
2407: \end{figure}
2408: 
2409: \clearpage
2410: 
2411: \begin{figure}
2412: \vspace*{-1cm}
2413: %\plotone{FIGURE_DF.55is.model_vs_fit.10_E_bins.40x14x14.ps}
2414: \plotone{f6.eps}
2415: %{\hspace{-15cm}
2416: %\plotfiddle{FIGURE_DF.55is.model_vs_fit.10_E_bins.40x14x14.ps}{1cm}{0}{400}{500}{-25}{0}}
2417: %\vspace*{-3cm}
2418: \caption{\label{fig:Ebins55is} Input ($\zeta_{\rm in}$; bottom) and
2419: fitted ($\zeta_{\rm fit}$; top) distributions of orbital mass weights
2420: as a function of $L_z$ and $I_3$ at fixed (non-consecutive) values of
2421: energy, for the 55is case with 1000 kinematic measurements with LOS
2422: velocities and proper motions (i.e., the same case as depicted in
2423: Figure \ref{fig:2Dplot55is}). These results correspond to our orbit
2424: library with $40\times28\times14$ combinations of $(E,L_z,I_3)$. From
2425: left to right, the panels show the weight distribution at increasing
2426: distances from the center of the galaxy, as indicated at the top of
2427: each pair of panels by the value $R_c$ (in arcmin) of the circular
2428: orbit at the corresponding energy. The fraction (in \%) of the total
2429: mass contained in each energy slice is indicated at the bottom of each
2430: panel. As in Figures \ref{fig:2Dplot55ns} and \ref{fig:2Dplot55is},
2431: black corresponds to zero weight and the white (brightest) color in
2432: each pair of panels (fit and model, or upper and lower) has been
2433: assigned to the maximum orbital weight among the two panels, so that
2434: the comparison between fits and models is made using the same color
2435: scale. }
2436: \end{figure}
2437: 
2438: \clearpage
2439: 
2440: \begin{figure}
2441: \vspace*{-1cm}
2442: %\plotone{ML_parabolas.N.eps}
2443: \plotone{f7.eps}
2444: %{\hspace{-15cm}
2445: %\plotfiddle{ML_parabola.55is_90is.40x14x14.eps}{1cm}{0}{400}{500}{-25}{0}}
2446: %\vspace*{-3cm}
2447: \caption{\label{fig:MLparabN} $\Delta \chi^2-$parabolae that
2448: illustrate the recovery of the input mass-to-light ratio
2449: $\Upsilon_0^*$ as a function of the number of available kinematic
2450: measurements. All input datasets include both LOS velocities and
2451: proper motions, and all \sch models have been computed using our small
2452: orbit library, the one with $20\times14\times7$ combinations of the
2453: $(E,L_z,I_3)$ integrals of motion. For any given input dataset, the
2454: symbols show the $\Delta \chi^2$ obtained by the discrete \sch code on
2455: a number of \ml\, values distributed around the correct one
2456: ($\Upsilon_0^*$). The curves connecting the computed models are
2457: polynomial fits of 5th order. When the number of datapoints $N$ is
2458: smaller, the $\Delta \chi^2$ parabola is shallower, and the
2459: statistical uncertainty on the inferred $\Upsilon$ is larger. The
2460: lowest curve is shown at its actual $\Delta \chi^2$.  Each subsequent
2461: curve was offset vertically by a value of 40 for visual clarity.}
2462: \end{figure}
2463: 
2464: \clearpage
2465: 
2466: \begin{figure}
2467: \vspace*{-1cm}
2468: %\plotone{ML_parabolas.type.eps}
2469: \plotone{f8.eps}
2470: %{\hspace{-15cm}
2471: %\plotfiddle{ML_parabola.55is_90is.40x14x14.eps}{1cm}{0}{400}{500}{-25}{0}}
2472: %\vspace*{-3cm}
2473: \caption{\label{fig:MLparab2} $\Delta \chi^2-$parabolae illustrating
2474: the recovery of the input mass-to-light ratio $\Upsilon_0^*$ for
2475: datasets with different types of kinematic information. All input
2476: datasets are of the 55is case ($i=55\grad$ with intermediate
2477: streaming) with 1000 measurements. As in Figure \ref{fig:MLparabN},
2478: all \sch models have been computed using our small orbit library, with
2479: $20\times14\times7$ combinations of the $(E,L_z,I_3)$ integrals of
2480: motion.  When fewer velocity components are observed, the $\Delta
2481: \chi^2$ parabola is shallower, and the statistical uncertainty on the
2482: inferred $\Upsilon$ is larger. 
2483: %The lowest curve is shown at its actual
2484: %$\Delta \chi^2$.  Each subsequent curve was offset vertically by a
2485: %value of 50 for visual clarity.
2486: }
2487: \end{figure}
2488: 
2489: \clearpage 
2490: 
2491: \begin{figure}
2492: \vspace*{-1cm}
2493: %\plotone{errors_ML.eps}
2494: \plotone{f9.eps}
2495: %{\hspace{-15cm}
2496: %\plotfiddle{ML_parabola.55is_90is.40x14x14.eps}{1cm}{0}{400}{500}{-25}{0}}
2497: %\vspace*{-3cm}
2498: \caption{\label{fig:errors_ML} Uncertainties in the recovery of the
2499: input mass-to-light ratio $\Upsilon_0^*$ as a function of the number
2500: of available kinematic measurements, and for input datasets with
2501: varying types of kinematic information. The upper panel shows the
2502: behavior of the statistical uncertainty in the determination of the
2503: best-fit \ml, i.e., the $1\sigma$ interval around the minimum of the
2504: corresponding parabolae in Figures \ref{fig:MLparabN} and
2505: \ref{fig:MLparab2}. The dashed lines in the upper panel have a slope
2506: of $-1/2$ and serve to demonstrate that the errors given by the \sch
2507: code roughly satisfy the $N^{-1/2}$ scaling expected from number
2508: statistics. The bottom panel shows the difference between the input
2509: mass-to-light ratio ($\Upsilon_0^*$) and the best-fit \ml\, given by
2510: the \sch code (i.e., the minimum of the parabolae of Figures
2511: \ref{fig:MLparabN} and \ref{fig:MLparab2}). The error bars are the
2512: $1\sigma$ errors from the upper panel. All \sch models in this figure
2513: have been computed using our small orbit library, with
2514: $20\times14\times7$ combinations of the $(E,L_z,I_3)$ integrals of
2515: motion. }
2516: \end{figure}
2517: 
2518: \clearpage
2519: 
2520: \begin{figure}
2521: \vspace*{0cm}
2522: %\plotone{delta_chisqr.fine_grid.55is_los_mu.eps}
2523: \plotone{f10.eps}
2524: %{\hspace*{-18cm}
2525: %\plotfiddle{55is.onlyLOS_vs_onlyMU.1000.ps}{1cm}{0}{1.0}{1.0}{0}{0}}
2526: %\vspace*{-3cm}
2527: \caption{\label{fig:55is_LOS_MU} Comparison of discrete \sch models
2528: based on data comprised of purely LOS velocities (upper panel) and
2529: purely proper motions (lower panel), for the 55is dataset with 1000
2530: kinematic measurements and libraries with $20\times14\times7$
2531: orbits. The lines are $\Delta\chi^2$ contours overlaid on grids of
2532: actually computed models (indicated by the small dots) with different
2533: combinations of inclination and mass-to-light ratio \ml. The correct
2534: input model is indicated as a large black dot, and the best-fit model
2535: as a star.  The first three contours are spaced in increments of
2536: $1\sigma$ confidence, with the $3\sigma$ contour (99.7\% confidence
2537: level) highlighted with a thick line. Discrete \sch fits on both
2538: only-LOS velocities and only proper motions satisfactorily recover the
2539: input mass-to-light ratio, but not the input inclination. In terms of
2540: the uncertainties in the best-fit parameters (i.e., the size of the
2541: confidence intervals), proper motions provide tighter constraints than
2542: only-LOS velocities. }
2543: \end{figure}
2544: 
2545: \clearpage
2546: 
2547: \pagestyle{empty}
2548: \begin{figure}
2549: \vspace*{-25mm}
2550: %\plotone{delta_chisqr.fine_grid.library_size.eps}
2551: \plotone{f11.eps}
2552: %{\hspace{-15cm}
2553: %\plotfiddle{}{1cm}{0}{400}{500}{-25}{0}}
2554: %\vspace*{-3cm}
2555: \caption{\label{fig:grid_55_90} Recovery of the input inclination and
2556: mass-to-light ratio, when both assumed unknown, for the 55is and 90is
2557: datasets (left- and right-hand panels, respectively). Shown are the
2558: $\Delta\chi^2$ contours obtained from grids of \sch models constructed
2559: using orbit libraries with different sampling of the available
2560: $(E,L_z,I_3)$ integral space. Upper panels correspond to libraries
2561: with $20\times14\times7$ orbits, while lower panels are based on
2562: libraries with $40\times28\times14$ orbits, i.e., with 8 times finer
2563: sampling.  The input mass-to-light ratio $\Upsilon_0^*$ is
2564: satisfactorily recovered regardless of the number of orbits (in all
2565: cases inside the $2\sigma$ confidence level). In terms of inclination,
2566: the shapes of the contours indicate that there may be two separate
2567: maxima providing similarly good fits to the data. For the smaller
2568: orbit library, the best-fit inclinations converge to the wrong
2569: solution, $i\approx 70\grad$, for both datasets. Nevertheless, the
2570: correct inclination is encompassed by the secondary maximum in the
2571: 55is case (upper left), and a clear elongation of the contours towards
2572: higher inclination is seen in the 90is case (upper right). When using
2573: the larger orbit library, however, the best-fit inclination is
2574: $i=57\grad$ for the 55is dataset, and $i=80\grad$ for the 90is
2575: dataset, in better agreement with the true values.  }
2576: \end{figure}
2577: 
2578: \clearpage
2579: 
2580: %% Here we use \plottwo to present two versions of the same figure,
2581: %% one in black and white for print the other in RGB color
2582: %% for online presentation. Note that the caption indicates
2583: %% that a color version of the figure will be available online.
2584: %%
2585: 
2586: %\begin{figure}
2587: %%\plottwo{f2.eps}{f2_color.eps}
2588: %\caption{A panel taken from Figure 2 of \citet{rudnick03}. 
2589: %See the electronic edition of the Journal for a color version 
2590: %of this figure.\label{fig2}}
2591: %\end{figure}
2592: 
2593: 
2594: %% If you are not including electonic art with your submission, you may
2595: %% mark up your captions using the \figcaption command. See the
2596: %% User Guide for details.
2597: %%
2598: %% No more than seven \figcaption commands are allowed per page,
2599: %% so if you have more than seven captions, insert a \clearpage
2600: %% after every seventh one.
2601: 
2602: %% Tables should be submitted one per page, so put a \clearpage before
2603: %% each one.
2604: 
2605: %% Two options are available to the author for producing tables:  the
2606: %% deluxetable environment provided by the AASTeX package or the LaTeX
2607: %% table environment.  Use of deluxetable is preferred.
2608: %%
2609: 
2610: %% Three table samples follow, two marked up in the deluxetable environment,
2611: %% one marked up as a LaTeX table.
2612: 
2613: %% In this first example, note that the \tabletypesize{}
2614: %% command has been used to reduce the font size of the table.
2615: %% We also use the \rotate command to rotate the table to
2616: %% landscape orientation since it is very wide even at the
2617: %% reduced font size.
2618: %%
2619: %% Note also that the \label command needs to be placed
2620: %% inside the \tablecaption.
2621: 
2622: %% This table also includes a table comment indicating that the full
2623: %% version will be available in machine-readable format in the electronic
2624: %% edition.
2625: 
2626: 
2627: 
2628: 
2629: %% The following command ends your manuscript. LaTeX will ignore any text
2630: %% that appears after it.
2631: 
2632: \end{document}
2633: 
2634: %%
2635: %% End of file `sample.tex'.
2636: