1: % $Id: sdss.tex,v 1.104 2007/07/28 02:07:43 oyachai Exp $
2:
3: %\documentclass[12pt,preprint]{aastex}
4: \documentclass[11pt,preprint]{emulateapj}
5: \usepackage{graphicx,natbib}
6: \usepackage{url}
7: \usepackage{color}
8: %\usepackage{amssymb}
9:
10: \citestyle{aa}
11:
12: \newcommand{\zphot}{$z_{\rm phot}$}
13: \newcommand{\zspec}{$z_{\rm spec}$}
14: \newcommand{\sigmoid}{{\rm s}}
15:
16:
17: \begin{document}
18:
19: \title{A Galaxy Photometric Redshift Catalog for the Sloan Digital Sky Survey Data Release 6}
20:
21: \author{
22: Hiroaki Oyaizu$^{1,2}$,
23: Marcos Lima$^{2,3}$,
24: Carlos E. Cunha$^{1,2}$,
25: Huan Lin$^{4}$,
26: Joshua Frieman$^{1,2,4}$,
27: Erin S. Sheldon$^{5}$
28: }
29:
30: \affil{
31: ${}^{1}$Department of Astronomy and Astrophysics, University of Chicago, Chicago, IL 60637 \\
32: ${}^{2}$Kavli Institute for Cosmological Physics, University of Chicago, Chicago, IL 60637 \\
33: ${}^{3}$Department of Physics, University of Chicago, Chicago, IL 60637 \\
34: ${}^{4}$Center for Particle Astrophysics, Fermi National Accelerator Laboratory, Batavia, IL 60510 \\
35: ${}^{5}$Center for Cosmology and Particle Physics and Department of Physics, New York University, New York, NY 10003 \\
36: }
37:
38:
39: %\date{\today}
40:
41: %-----------------------------------------------------------------------------
42:
43: \begin{abstract}
44:
45: We present and describe a catalog of galaxy photometric redshifts (photo-z's)
46: for the Sloan Digital Sky Survey (SDSS) Data Release 6 (DR6).
47: We use the Artificial Neural Network (ANN) technique to calculate photo-z's
48: and the Nearest Neighbor Error (NNE) method to estimate photo-z errors for
49: $\sim$ 77 million objects classified as galaxies in DR6 with $r < 22$.
50: The photo-z and photo-z error estimators are trained and validated on a
51: sample of $\sim 640,000$ galaxies that have SDSS photometry and
52: spectroscopic redshifts measured by SDSS, 2SLAQ, CFRS, CNOC2, TKRS,
53: DEEP, and DEEP2.
54: For the two best ANN methods we have tried,
55: we find that 68\% of the galaxies in the validation set have a photo-z
56: error smaller than
57: $\sigma_{68} =0.021$ or $0.024$.
58: After presenting our results and quality tests, we provide a short guide
59: for users accessing the public data.
60:
61: \end{abstract}
62:
63: \keywords{photometric redshifts sdss -- Sloan Digital Sky Survey}
64:
65: %\maketitle
66: %------------------------------------------------------------------------------
67:
68:
69: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
70: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
71: \section{Introduction}\label{int}
72: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
73: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
74:
75: While spectroscopic redshifts have now been measured for over one million
76: galaxies, in recent years
77: digital sky surveys have obtained multi-band imaging
78: for of order a hundred million galaxies. Deep, wide-area surveys planned for
79: the next decade will increase the number of galaxies with
80: multi-band photometry to a few billion. Due to technological and financial
81: constraints, obtaining spectroscopic redshifts for more than a
82: small fraction of these galaxies will remain impractical for the foreseeable
83: future. As a result, over the last decade substantial effort has gone into
84: developing photometric redshift (photo-z) techniques, which use
85: multi-band photometry to estimate approximate galaxy redshifts. For many
86: applications in extragalactic astronomy and cosmology, the resulting
87: photometric redshift precision is sufficient for the science goals at
88: hand, provided one can accurately characterize the uncertainties in the
89: photo-z estimates.
90:
91: Two broad categories of photo-z estimators are in wide use:
92: template-fitting and training set methods. In template-fitting, one
93: assigns a redshift to a galaxy by finding
94: the redshifted spectral energy distribution (SED), selected
95: from a libary of templates,
96: that best reproduces the observed fluxes in the broadband filters.
97: By contrast, in the training set approach, one
98: uses a training set of galaxies with
99: spectroscopic redshifts and photometry to derive an empirical relation
100: between photometric observables (e.g., magnitudes, colors, and morphological
101: indicators) and redshift.
102: Examples of empirical methods include Polynomial Fitting \citep{con95b},
103: the Nearest Neighbor method \citep{csa03},
104: the Nearest Neighbor Polynomial (NNP) technique \citep{cun07},
105: Artificial Neural Networks (ANN) \citep{col04,van04,dab07}, and
106: Support Vector Machines \citep{wad04}. When a large spectroscopic
107: training set that is representative of the photometric data set to be
108: analyzed is
109: available, training set techniques typically outperform template-fitting
110: methods, in the sense that the photo-z estimates have smaller scatter
111: and bias with respect to the true redshifts \citep{cun07}. On the
112: other hand, template-fitting can be applied to a photometric sample
113: for which relatively few spectroscopic analogs exist.
114: For a comprehensive review and comparison of photo-z methods,
115: see \cite{cun07}.
116:
117: In this paper, we present a publicly available galaxy photometric redshift
118: catalog for the Sixth Data Release (DR6) of the Sloan Digital Sky
119: Survey (SDSS) imaging catalog \citep{bla03b,eis01,gun98,ive04,str02,yor00}.
120: We use the ANN photo-z method, which we have shown to
121: be a superior training set method \citep{cun07}, and briefly compare the
122: results using different photometric observables.
123: We also compare the ANN results with those from NNP, an empirical
124: method which achieves similar performance to the ANN method \citep{cun07}. %briefly with other methods.
125: Since the SDSS photometric catalog covers a large area of sky, a number
126: of deep spectroscopic galaxy samples with SDSS photometry are available
127: to use as training sets, as shown in Fig.~\ref{dist.sdss}.
128: In combination, these spectroscopic samples cover the full apparent
129: magnitude range of the SDSS photometric sample.
130:
131: The paper is organized as follows.
132: In \S \ref{sel} we briefly describe the SDSS DR6 photometric catalog
133: and the selection criteria used
134: to obtain the galaxy photometric sample from the catalog.
135: In \S \ref{tra} we describe the spectroscopic catalogs used
136: to construct the photo-z training and validation sets.
137: In \S \ref{met} we outline the photo-z methods as well as the
138: photo-z error estimator technique applied to the galaxy sample.
139: Statistical results for photometric redshift performance, errors,
140: and redshift distributions
141: are presented in \S \ref{res}. In \S \ref{rec}
142: we make recommendations for possible
143: additional cuts on the photo-z catalog based on our
144: own flags and those in the SDSS database.
145: In \S \ref{cat} we briefly describe how to access the
146: photo-z catalog from the public SDSS data server, and in \S \ref{con} we
147: present our conclusions. For completeness, Appendix \ref{query}
148: provides the database query used to select the photometric sample,
149: Appendix \ref{stargal} discusses issues of star-galaxy separation,
150: and Appendix \ref{photdr5} briefly describes an earlier version
151: of the photo-z algorithm used for SDSS DR5 \citep{ade07}.
152:
153: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
154: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
155: \section{SDSS Photometric Catalog and Galaxy Selection}
156: \label{sel}
157: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
158: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
159:
160: The SDSS comprises a large-area
161: imaging survey of the north Galactic cap, a multi-epoch imaging survey of
162: an equatorial stripe in the south Galactic cap, and a spectroscopic survey of
163: roughly $10^6$ galaxies and $10^5$ quasars
164: \citep{yor00}.
165: The survey uses a dedicated, wide-field, 2.5m telescope \citep{gun06} at
166: Apache Point Observatory, New Mexico.
167: Imaging is carried out in drift-scan mode using a 142 mega-pixel camera
168: \citep{gun06} that gathers data in five broad bands, $u g r i z$, spanning
169: the range from 3,000 to 10,000 \AA \, \citep{fuk96}, with an effective exposure
170: time of 54.1 seconds per band.
171: The images are processed using specialized
172: software \citep{lup01,sto02} and are
173: astrometrically \citep{pie03} and photometrically \citep{hog01,tuc06}
174: calibrated using observations of a set of primary standard stars
175: \citep{smi02} observed on a neighboring 20-inch telescope.
176:
177: The imaging in the sixth SDSS Data Release (DR6) covers an essentially
178: contiguous region of the north Galactic cap, with only a few small patches
179: remaining to be observed. In any region where imaging runs overlap, one run is
180: declared primary\footnote{For the precise definition of primary objects see
181: {\tt http://cas.sdss.org/dr6/en/help/docs/glossary.asp\#P}}
182: and is used for spectroscopic target selection;
183: other runs are declared secondary.
184: The area covered by the DR6 primary imaging survey, including the
185: southern stripes, is $8417 \textrm{ deg}^2$, but
186: DR6 includes both the primary and secondary observations of
187: each area and source \citep{dr6}.
188:
189: \begin{figure}
190: \begin{minipage}[t]{85mm}
191: \begin{center}
192: \resizebox{85mm}{!}{\includegraphics[angle=0]{f1.c.eps}}
193: \end{center}
194: \end{minipage}
195: \caption{Normalized $r$ magnitude distributions for various catalogs.
196: {\it Top three rows:}
197: the distributions of the spectroscopic catalogs used for photo-z
198: training and validation are
199: shown for 2SLAQ, CFRS, CNOC2, TKRS,
200: DEEP and DEEP2, and the SDSS spectroscopic sample.
201: $N_{tot}$ denotes the total number of galaxy measurements used
202: from each catalog; for galaxies in regions with repeat SDSS imaging,
203: each independent photometric measurement is counted separately.
204: {\it Bottom row:} ({\it left})---the distribution of the combined
205: spectroscopic sample; ({\it right})---the
206: distribution for the SDSS photometric galaxy sample, where
207: objects were classified as galaxies according to the
208: photometric TYPE flag (see text).
209: }\label{dist.sdss}
210: \end{figure}
211:
212: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
213: %\section{Photometric selection of the galaxies} \label{sel}
214: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
215:
216:
217: The SDSS database provides a variety of measured magnitudes for each
218: detected object. Throughout this paper, we use dereddened model magnitudes to
219: perform the photometric redshift computations. To determine the model
220: magnitude, the SDSS photometric pipeline fits two
221: models to the image of each galaxy in each passband: a de Vaucouleurs (early-type) and
222: an exponential (late-type) light profile.
223: The models are convolved with the estimated point
224: spread function (PSF), with arbitrary axis ratio and position angle.
225: The best-fit model in the $r$ band (which is used to fix the model scale
226: radius) is then applied to the other passbands and convolved with the
227: passband-dependent PSFs to yield the model magnitudes.
228: Model magnitudes provide an unbiased color estimate in the absence of color
229: gradients \citep{sto02}, and the dereddening procedure removes the
230: effect of Galactic extinction \citep{sch98}.
231:
232: %%%%%%%%%%%%%%%%%%
233:
234: \begin{deluxetable}{c c | c c}
235: \tablewidth{0pt}
236: \tablecaption{Photometric Sample Properties}
237: \startdata
238: \hline
239: \hline
240: \multicolumn{2}{c}{\hspace{0.1 in} AB magnitude limits \hspace{0.2 in} }
241: &\multicolumn{2}{c}{\hspace{0.2 in} RMS photometric \hspace{0.4 in}} \\
242: \multicolumn{2}{c}{}
243: & \multicolumn{2}{c}{\hspace{0.2 in} calibration errors } \\
244: \hline
245: \hspace{0.1 in} $u$ & 22.0 & \hspace{0.4 in} $r$ & 2\% \\
246: \hspace{0.1 in} $g$ & 22.2 & \hspace{0.4 in} $u-g$ & 3\% \\
247: \hspace{0.1 in} $r$ & 22.2 & \hspace{0.4 in} $g-r$ & 2\% \\
248: \hspace{0.1 in} $i$ & 21.3 & \hspace{0.4 in} $r-i$ & 2\% \\
249: \hspace{0.1 in} $z$ & 20.5 & \hspace{0.4 in} $i-z$ & 3\% \\
250: \enddata
251: \tablecomments{Magnitude limits are for 95\% completeness for point
252: sources in typical seeing; 50\% completeness numbers are generally
253: 0.4 mag fainter \citep{ade07}. The median seeing for the SDSS imaging
254: survey is $1.4''$.
255: } \label{propphot}
256: \end{deluxetable}
257:
258: To construct the photometric sample of galaxies for which we wish to
259: estimate photo-z's, we obtained
260: a catalog drawn from the SDSS CasJobs website
261: {\tt http://casjobs.sdss.org/casjobs/}.
262: We checked some of the SDSS photometric flags to ensure that we have obtained
263: a reasonably clean galaxy sample. In particular,
264: we selected all primary objects from DR6 that have the TYPE flag
265: equal to $3$ (the type for galaxy) and that do not
266: have any of the flags BRIGHT, SATURATED, or SATUR\_CENTER set.
267: %NOPETRO\_BIG set.
268: For the definitions of these flags we refer the reader to the
269: PHOTO flags entry at the SDSS
270: website\footnote{{\tt http://cas.sdss.org/dr6/en/help/browser/browser.asp}}
271: or to Appendix \ref{query}.
272: We also took into account the nominal SDSS flux limit
273: (see Table~\ref{propphot}) by only selecting galaxies with dereddened model
274: magnitude $r<22.0$.
275: The full database query we used is given in Appendix \ref{query}.
276:
277: The photometric galaxy catalog we have selected suffers from impurity and
278: incompleteness at some level, since
279: the photometric pipeline cannot
280: separate stars from galaxies with 100\% success
281: at faint magnitudes. We
282: describe some of our tests of star/galaxy separation in
283: Appendix \ref{stargal}, where we show that the SDSS TYPE flag
284: provides star/galaxy separation performance similar to other
285: methods.
286:
287: \begin{figure}
288: \begin{minipage}[t]{85mm}
289: \begin{center}
290: \resizebox{85mm}{!}{\includegraphics[angle=0]{f2.c.eps}}
291: \end{center}
292: \end{minipage}
293: \caption{Distribution of $g-r$ and $r-i$ colors for different SDSS samples. {\it Top row:} the color distributions for galaxies in the SDSS spectroscopic
294: sample.
295: {\it Middle row:} the color distributions for galaxies in the other (non-SDSS)
296: spectroscopic training samples.
297: {\it Bottom row:} the color distributions for galaxies in the photometric
298: sample.
299: As above, galaxy/star classification used the photometric TYPE flag.
300: }\label{dist.color.sdss}
301: \end{figure}
302:
303: The final photometric sample comprises $77,418,767$ galaxies.
304: The $r$ magnitude distribution of this sample is shown in
305: the bottom right panel of Fig.~\ref{dist.sdss}; the $g-r$ and
306: $r-i$ color distributions
307: are shown in the bottom panels of Fig.~\ref{dist.color.sdss}.
308:
309: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
310: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
311: \section{Spectroscopic Training and Validation sets} \label{tra}
312: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
313: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
314:
315:
316: Since our methods to estimate photo-z's and photo-z errors are
317: training-set based, we would ideally like the spectroscopic
318: training set to be
319: fully representative of the photometric sample to be analyzed, i.e., to have
320: similar statistical properties and magnitude/redshift distributions.
321: Training-set methods can be thought of as inherently Bayesian, in the sense
322: that the training-set distributions form effective priors for the analysis of the
323: photometric sample; to the extent that the training-set distributions
324: reflect those of the photometric sample, we may expect the photo-z estimates
325: to be unbiased (or at least they will not be biased by the prior).
326: Given the practical difficulties of carrying out spectroscopy at
327: faint magnitudes and low surface brightness, such an ideal generally cannot be achieved.
328: Realistically, all we can hope for is a training set that
329: (a) is large enough that statistical fluctuations are small and (b)
330: spans the same magnitude, color, and redshift ranges as the photometric sample.
331: Fortunately, our tests indicate that the estimated photo-z's
332: depend only weakly on the shape of the
333: redshift and magnitude distributions of the training set for the SDSS.
334:
335:
336: \begin{figure*}
337: \begin{center}
338: \resizebox{150mm}{!}{\includegraphics[angle=0]{f3.eps}}
339: \caption{A simple FFMP network with 3 layers and configuration $2:1:1$.
340: The inputs are the
341: two magnitudes, $m_1$ and $m_2$.
342: Ix denotes the input from node x, and Ox is the corresponding output of this node.
343: The weights $w$ associated with each connection are found by training the network
344: using training and validation sets (see text).}
345: \label{NNsimple}
346: \end{center}
347: \end{figure*}
348:
349:
350: We have constructed a spectroscopic sample consisting of $639,911$
351: galaxies that have SDSS photometry measurements
352: (counting repeats; see below) and that have
353: spectroscopic redshifts measured by the SDSS or by
354: other surveys, as described below.
355: We imposed a magnitude limit of $r<23.0$ on the spectroscopic
356: sample and applied
357: additional cuts on the quality of the spectroscopic
358: redshifts reported by the different surveys.
359: Since we impose a limit of $r<22.0$ for the SDSS photometric sample,
360: the fainter limit chosen
361: for the spectroscopic training sample accommodates the full photometric
362: range of interest without creating boundary effects for photo-z's of
363: galaxies with magnitudes near the photometric sample limit of $r = 22$.
364: Each survey providing spectroscopic redshifts defines a redshift
365: quality indicator; we refer the reader to the respective publications listed
366: below for their precise definitions.
367: For each survey, we chose a redshift quality cut roughly corresponding
368: to 90\% redshift confidence or greater.
369: The SDSS spectroscopic sample
370: provides $531,672$ redshifts, principally from the MAIN and
371: Luminous Red Galaxy (LRG) samples, with confidence level
372: $z_{\rm conf} > 0.9$. The remaining redshifts are:
373: $21,123$ from the Canadian Network for Observational Cosmology
374: Field Galaxy Survey \citep[CNOC2;][]{yee00},
375: $1,830$ from the Canada-France Redshift Survey \citep[CFRS;][]{lil95}
376: with Class $> 1$,
377: $31,716$ from the Deep Extragalactic Evolutionary Probe \citep[DEEP;][]{deep2}
378: with $q_z$ = A or B and from DEEP2
379: \citep{wei05}\footnote{{\tt http://deep.berkeley.edu/DR2/ }}
380: with $z_{\rm quality} \geq 3$,
381: $728$ from the Team Keck Redshift Survey \citep[TKRS;][]{wir04}
382: with $z_{\rm quality} > -1$, and
383: $52,842$ LRGs from the
384: 2dF-SDSS LRG and QSO Survey
385: \citep[2SLAQ;][]{can06}\footnote{{\tt http://lrg.physics.uq.edu.au/New\_dataset2/ }}
386: with $z_{\rm op} \geq 3$.
387:
388: We positionally matched the galaxies with spectroscopic redshifts against photometric
389: data in the SDSS {\tt BestRuns} CAS database, which allowed us
390: to match with photometric measurements in different SDSS imaging runs.
391: The above numbers for galaxies with redshifts count independent photometric
392: measurements of the same objects due to multiple SDSS imaging of the same
393: region; in particular SDSS Stripe 82 has been imaged a number of times.
394: The numbers of {\em unique} galaxies used from these surveys are
395: $1,435$ from CNOC2,
396: $272$ from CFRS,
397: $6,049$ from DEEP and DEEP2,
398: $389$ from TKRS, and
399: $11,426$ from 2SLAQ.
400: The SDSS spectroscopic samples were drawn from the SDSS primary galaxy sample and therefore are all unique.
401: The spectroscopic sample obtained by combining all these catalogs,
402: including the repeats, was divided into two catalogs of the
403: same size ($\sim 320,000$ objects each).
404: One of these catalogs was taken to be
405: the {\it training set} used by the photo-z and error estimators, and the other
406: was used as a {\it validation set} to carry out tests of photo-z
407: quality (see \S \ref{subsec:meth_photoz}). Our tests indicate that this
408: procedure of treating different
409: images of the same training/validation set galaxies as independent objects leads
410: to good results, provided all the photometric measurements for a given object
411: are confined to either the training set or the validation set and not mixed. By
412: contrast,
413: excluding such multiple images from the spectroscopic sample would result
414: in much smaller training and validation sets; these would be very sparse at
415: faint magnitudes, leading to much diminished photo-z quality there. On the other
416: hand, splitting
417: the repeat images of a given object between the training and validation sets
418: may result in ``over-fitting'' of the derived photo-z's
419: (see \S \ref{subsec:meth_photoz}).
420:
421: The $r$-magnitude and color ($g-r$ and $r-i$)
422: distributions for the spectroscopic samples and for
423: the photometric sample are shown in Figs. \ref{dist.sdss} and
424: \ref{dist.color.sdss}. While the magnitude and color distributions of
425: the combined spectroscopic sample are not
426: identical to those of the photometric sample, the
427: spectroscopic sample does span the
428: range of apparent magnitude and color of the photometric sample.
429: To test the impact of having a training set that is not fully representative
430: of the photometric sample, we
431: divided the spectroscopic sample into smaller, alternate training and
432: validation sets. For instance,
433: to test the effect of the training-set magnitude distribution on the
434: photo-z estimates, we created a training set with a flat $r$
435: magnitude distribution and another with an $r$ magnitude distribution similar to that
436: of the
437: photometric sample. Our tests indicated that the photo-z quality
438: is not strongly affected by the magnitude
439: distribution of the training set.
440: The changes in the photo-z performance metrics
441: (the rms scatter and the 68\% CL region, defined below in
442: \S \ref{res}) were smaller than $10\%$ when the training-set magnitude
443: distribution was varied between these different choices.
444: Since using the entire spectroscopic
445: sample for the training and validation sets produced marginally better results
446: than all other cases tested, we have adopted this as our final choice. In addition,
447: we tested the effect of the size of the training set on
448: our photo-z calculations. We found that the photo-z performance metrics
449: defined in \S \ref{res-photoz}
450: are degraded by no more than 10\% when the training set is artificially
451: reduced to 10\% of its original size. Even when the training set is
452: reduced to $\sim 1\%$ of its original size, the photo-z performance metrics are
453: degraded by less than $25\%$. This gives us confidence that
454: the spectroscopic training set size used here is sufficient for extracting
455: nearly optimal photo-z estimates.
456:
457: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
458: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
459: \section{Methods}\label{met}
460: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
461: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
462:
463: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
464: \subsection{ANN and NNP Photometric redshifts}
465: \label{subsec:meth_photoz}
466: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
467:
468: The ANN method that we use to estimate galaxy photo-z's is
469: a general classification and interpolation tool used
470: successfully in an array of fields such as hand writing recognition,
471: automatic aircraft
472: piloting\footnote{{\tt http://www.nasa.gov/centers/dryden/news/NewsReleases/2003/03-49.html}},
473: detecting credit card
474: fraud\footnote{{\tt http://www.visa.ca/en/about/visabenefits/innovation.cfm}},
475: and extracting astronomically interesting sources in a telescope image
476: \citep{bertin96}.
477:
478: We use a particular type of ANN called a Feed Forward Multilayer
479: Perceptron (FFMP) to map the relationship between photometric observables
480: and redshifts.
481: An FFMP network consists of several input nodes, one or more hidden layers,
482: and several output nodes, all interconnected by weighted connections
483: (see Fig.~\ref{NNsimple}).
484: We follow the notation of \cite{col04} and denote a network with
485: $N_i$ input nodes, $N_{h_j}$ nodes in hidden layer $j$, and $N_o$
486: output nodes as $N_i:N_{h_1}:N_{h_2}:...:N_{h_m}:N_o$.
487: For each input object, the input photometric
488: data (e.g., magnitudes, colors, concentrations, etc.)
489: are fed into the input
490: nodes of the FFMP, which fire signals according to the values of the
491: input data.
492: Each node in a hidden layer receives a total input which is a weighted
493: sum of the outputs from the nodes in the previous layer,
494: i.e., node $i$ in a hidden layer receives an input $I_i$ given by
495:
496: \begin{equation}
497: I_i = \sum_j w_{ij} O_j,
498: \end{equation}
499:
500: \noindent where $O_j$ is the output of the $j^{\rm th}$ node of the previous
501: layer and $w_{ij}$ is the weight of the connection between node $i$ in
502: the hidden layer and node $j$ in the previous layer.
503: Given the input $I_i$, the output $O_i$ of node $i$ is a function $f$ of the
504: input,
505:
506: \begin{equation}
507: O_i=f(I_i), \label{act}
508: \end{equation}
509:
510: \noindent where $f$ is the activation function.
511: Repeating this process, signals propagate up to the output nodes.
512: The activation function is typically a sigmoid function:
513:
514: \begin{equation}
515: f(I_i) = \frac{1}{1 + e^{-I_i}}. \label{sigm}
516: \end{equation}
517:
518: \noindent However, there are various alternatives, such as step
519: functions and hyperbolic tangents.
520: \cite{van04} show that the choice of activation functions makes
521: no significant difference in the result.
522:
523: We use $X$:20:20:20:1 networks to estimate photo-z's, where $X$ is the
524: number of input photometric parameters per galaxy. % in this work.
525: The corresponding number of degrees of freedom (the number of weights) is
526: roughly 1,000, depending on the actual value of $X$.
527: We use hyperbolic tangent functions as the activation function of the
528: hidden layers and a linear activation function for the output layer.
529:
530: Despite the occasional aura of mystery surrounding neural networks,
531: an FFMP is nothing more than a complex
532: mathematical function; in fact, one can always write down the analytic
533: expression corresponding to a neural network function.
534:
535: Once the network configuration is specified, it can be trained to
536: output an estimate of redshift given the input photometric observables.
537: The training process involves
538: finding the set of weights $w_{ij}$ that
539: minimize a score function $E$, chosen here to be
540:
541: \begin{equation}
542: E = \frac{1}{2}\sum_i(z_{\rm spec}^{i} - z_o^{i})^2 ~,
543: \label{eq:score}
544: \end{equation}
545:
546: \noindent where $z_{\rm spec}$ is the measured spectroscopic redshift, $z_o$ is the
547: output redshift of the output node, and the sum is over all galaxies
548: in the training set. Note that the choice of score function is not unique,
549: and different choices will in general lead to different photo-z estimates.
550: The minimization of this score function can be done efficiently
551: because its derivatives with respect to the weights
552: are available analytically.
553: We use a Variable Metric method as described in \cite{pre92} for the minimization.
554:
555: In machine learning, over-fitting refers
556: to the tendency of an algorithm with many adjustable parameters
557: to fit to the noise in the training set data.
558: In order to avoid over-fitting, we use the technique of
559: early stopping.
560: The spectroscopic sample is divided into two
561: independent subsets, the
562: {\it training} and {\it validation} sets,
563: and the formal minimizations are done using the training set.
564: After each minimization step, the network is evaluated on the
565: validation set, and
566: the set of weights that performs best on the validation set
567: is chosen as the final set. Another issue in machine learning is that
568: minimization procedures that start at different initial choices of weights
569: generally end at different local minima of the score
570: function.
571: To reduce the chance of ending in a less-than-optimal local minimum,
572: we minimize five networks starting at different positions in the space of weights.
573: Among these, we choose the network that gives the lowest photo-z scatter
574: (cf. Eq. \ref{eq:score})
575: in the validation set.
576: For more details of our implementation of the ANN and its performance on
577: mock catalogs and real data, see \cite{cun07}.
578:
579: The ANN photo-z algorithm is very flexible in the sense that it is easy
580: to change the input parameters, the training set, and the network configurations.
581: We tried a variety of combinations of possible input photometric
582: observables to see their effects on photo-z quality.
583: We calculated photo-z's using galaxy magnitudes, colors, and the
584: concentration indices for some or all of the passbands.
585: The concentration index $c_i$ in passband $i$ is defined as the ratio of {\tt PetroR50}
586: and {\tt PetroR90}, which are the radii that encircle 50\% and 90\% of the
587: Petrosian flux, respectively. Early-type (E and S0) galaxies, with centrally
588: peaked surface brightness profiles, tend to have low values of the
589: concentration index, while late-type spirals, with quasi-exponential light
590: profiles, typically have higher values of $c$.
591: Previous studies \citep{morg58,shi01,yam05,par05} have shown
592: that the concentration parameter correlates well
593: with galaxy morphological type, and we used it to help break the
594: degeneracy between redshift and galaxy type.
595: We present the photo-z results for different combinations of input
596: parameters in \S\ref{res}.
597:
598: For comparison, we also computed photo-z's for the
599: validation set using another empirical method, the Nearest Neighbor
600: Polynomial (NNP) technique \citep{cun07}.
601: In NNP, to derive a photo-z for a galaxy in the photometric sample,
602: we look for its training-set nearest neighbors in the space of
603: photometric observables (magnitudes, colors, etc.).
604: Suppose we have $N_D$ photometric data entries for each galaxy.
605: The data vector for the galaxy of interest in the photometric sample is
606: denoted by $\ D^{\mu}=(D^1,D^2,...,D^{N_D})$,
607: while the data vector for the $i^{\rm th}$ galaxy in the training set is
608: $\ D^{\mu}_i=(D^1_i,D^2_i,...,D^{N_D}_i)$.
609: The distance $d_i$ between the photometric object and the $i^{\rm th}$
610: training set galaxy is defined using a flat metric in data space,
611:
612: \begin{equation}
613: d_i^2 = \sum_{\mu=1}^{N_D} (D^{\mu} - D_{i}^{\mu})^2~. \label{nndef}
614: \end{equation}
615:
616:
617: \noindent The nearest neighbors are the training-set objects
618: for which $d_i$ is minimum. Once the nearest neighbors for a given
619: galaxy are identified,
620: they are used to fit the coefficients of a local, low-order polynomial relation
621: between photometric observables and redshift.
622: The galaxy photo-z is then obtained by applying
623: the derived relation to the photometric object.
624:
625: For the NNP method employed in this work, we take the
626: photometric data $D^{\mu}$ in Eq.~(\ref{nndef})
627: to be the four ``adjacent'' galaxy colors $u-g, \ g-r, \ r-i, \ i-z$; we found that
628: this choice produces results marginally better than using the galaxy
629: magnitudes.
630: We use the nearest $1000$ neighbors to fit a quadratic polynomial
631: relation between redshift and the photometric data, here chosen
632: to be the five magnitudes in each passband ($ugriz$) and their
633: corresponding concentration indices.
634: We note that \cite{wan07} used a similar technique to estimate
635: photo-z's for a small sample of SDSS {\it spectroscopic} galaxies.
636: They applied the Kernel Regression method of order 0, weighting
637: the training-set neighbors and computing photo-z's by using the
638: weighted average of the neighbors' redshifts.
639: Our NNP method is closer to a Kernel Regression of order 2, since
640: we perform quadratic fits; however, we do not apply variable weights to the neighbors
641: but treat them equally in the fit.
642:
643: Whereas the ANN method provides
644: a single, nonlinear, global fit using the whole
645: training set and applies the derived photo-z relation to all photometric objects,
646: the NNP method yields a separate, linear (in parameters), local fit for
647: each photometric object using its neighbors. If
648: the galaxy magnitude-concentration-redshift hypersurface is a differentiable manifold,
649: i.e., if it can be locally approximated by a hyperplane even though it
650: is globally curved, then these two photo-z methods should be roughly
651: equivalent. Indeed, as we show in \S \ref{res}, their performance is very similar.
652:
653: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
654: \subsection{Photometric redshift errors}\label{meter}
655: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
656:
657: We estimated photo-z errors for objects in the photometric catalog using
658: the Nearest Neighbor Error (NNE) estimator \citep{oya07}.
659: The NNE method is training-set based, with
660: a neighbor selection similar to the NNP photo-z estimator; it
661: associates photo-z errors to photometric objects by considering the
662: errors for objects with similar multi-band magnitudes in the
663: validation set.
664: We use the validation set, because the photo-z's of the training set could be
665: over-fit, which would result in NNE underestimating the photo-z errors.
666:
667: The procedure to calculate the redshift error for a galaxy in the photometric
668: sample is as follows.
669: We find the validation-set nearest neighbors to the galaxy of
670: interest. In contrast to NNP,
671: where the distance in Eq.~(\ref{nndef}) was defined in color space,
672: the NNE distance is defined in magnitude space, since photo-z errors
673: correlate strongly with magnitude.
674: Since the selected nearest neighbors are in the spectroscopic sample,
675: we know their photo-z errors, $\delta z = z_{\rm phot}-z_{\rm spec}$, where
676: $z_{\rm phot}$ is computed using the ANN or the NNP method.
677: We calculated the $68\%$ width of the $\delta z$ distribution
678: for the neighbors and assigned that number as the photo-z error
679: estimate for the photometric galaxy. Here we selected
680: the nearest $200$ neighbors of each object to estimate its photo-z error.
681: In studies of photo-z error estimators applied
682: to mock and real galaxy catalogs, we found that NNE
683: accurately predicts the photo-z error when the training set is
684: representative of the photometric sample \citep{oya07}.
685:
686: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
687: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
688: \subsection{Estimating the Redshift Distribution}\label{estdist}
689: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
690: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
691:
692: As we shall see in \S \ref{res-photoz}, estimates for
693: galaxy photo-z's suffer from statistical biases that in general
694: cannot be completely removed on an object-by-object basis. However, we
695: can seek an unbiased estimate of the true redshift {\it distribution}
696: for the photometric sample that is independent of individual
697: galaxy photo-z estimates. For some statistical applications,
698: the redshift distribution of the photometric sample, as opposed
699: to individual galaxy photo-z's, is all that is required.
700: One way to estimate this distribution is to
701: assign a weight to every galaxy in the spectroscopic sample
702: such that the {\it weighted} spectroscopic sample has the same
703: distributions of magnitudes and colors as the photometric sample.
704: The $z_{\rm spec}$ distribution of this weighted spectroscopic
705: sample provides an estimate of the true, underlying
706: redshift distribution of the photometric sample.
707:
708: The weight $W^{\alpha}$ of the $\alpha^{\rm th}$ spectroscopic
709: galaxy is calculated by comparing
710: the local density around the galaxy in the spectroscopic sample with
711: the density of the corresponding region in the photometric sample.
712: The local density is evaluated by counting the number of
713: nearest neighbors using the distance measured in the space of photometric
714: observables, as in Eq.~(\ref{nndef}). We fix the number of spectroscopic
715: neighbors, $N_{\rm S}$, which determines the distance $d_{\rm max}$
716: to the $N_{\rm S}^{\rm th}$-nearest spectroscopic neighbor.
717: We then find the number of neighbors $N_{\rm P}$ in the photometric
718: sample within the same distance $d_{\rm max}$ of the spectroscopic
719: galaxy. Up to an arbitrary normalization factor, the weight is defined as
720:
721: \begin{eqnarray}
722: W^{\alpha} \sim \frac{N_{\rm P} }{ N_{\rm S} } ~.
723: \label{eqn:weight}
724: \end{eqnarray}
725:
726: \noindent For our estimates, we chose $N_{\rm S}=20$, which provides a good
727: match of the weighted spectroscopic distributions of magnitudes
728: and colors to those of the photometric sample. We note that if
729: additional cuts in magnitude or color are applied to the photometric
730: sample, then this procedure must be repeated for the newly selected photometric
731: sample.
732: More details and tests of this method and comparisons with
733: other methods for estimating the
734: underlying redshift distribution (e.g., deconvolving the error distribution
735: from the \zphot \ distribution) will be presented
736: separately \citep{lim07}.
737:
738:
739: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
740: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
741:
742: \begin{figure*}
743: \begin{center}
744: \begin{minipage}[t]{46mm}
745: \begin{center}
746: \resizebox{46mm}{!}{\includegraphics[angle=0]{f4a.eps}}
747: \end{center}
748: \end{minipage}
749: \begin{minipage}[t]{46mm}
750: \begin{center}
751: \resizebox{46mm}{!}{\includegraphics[angle=0]{f4b.eps}}
752: \end{center}
753: \end{minipage}
754: \begin{minipage}[t]{46mm}
755: \begin{center}
756: \resizebox{46mm}{!}{\includegraphics[angle=0]{f4c.eps}}
757: \end{center}
758: \end{minipage}
759: \begin{minipage}[t]{46mm}
760: \begin{center}
761: \resizebox{46mm}{!}{\includegraphics[angle=0]{f4d.eps}}
762: \end{center}
763: \end{minipage}
764: \begin{minipage}[t]{46mm}
765: \begin{center}
766: \resizebox{46mm}{!}{\includegraphics[angle=0]{f4e.eps}}
767: \end{center}
768: \end{minipage}
769: \begin{minipage}[t]{46mm}
770: \begin{center}
771: \resizebox{46mm}{!}{\includegraphics[angle=0]{f4f.eps}}
772: \end{center}
773: \end{minipage}
774: \begin{minipage}[t]{46mm}
775: \begin{center}
776: \resizebox{46mm}{!}{\includegraphics[angle=0]{f4g.eps}}
777: \end{center}
778: \end{minipage}
779: \begin{minipage}[t]{46mm}
780: \begin{center}
781: \resizebox{46mm}{!}{\includegraphics[angle=0]{f4h.eps}}
782: \end{center}
783: \end{minipage}
784: \begin{minipage}[t]{46mm}
785: \begin{center}
786: \resizebox{46mm}{!}{\includegraphics[angle=0]{f4i.eps}}
787: \end{center}
788: \end{minipage}
789: \end{center}
790: \caption{ $z_{\rm phot}$ versus $z_{\rm spec}$ for the validation set for
791: different ranges of $r$ magnitude and for different photo-z techniques.
792: {\it Left column:} objects with $r<20$; {\it middle column:} objects with $r>20$;
793: {\it right column:} all objects.
794: {\it Top row:} ANN case D1, where the input photometric data comprise
795: the 5 magnitudes ($ugriz$) and the 5 concentration parameters, and the training
796: is split into 5 bins of $r$ magnitude
797: {\it Middle row:} ANN case CC2, where the input data are
798: the 4 colors $u-g$, $g-r$, $r-i$, $i-z$, and 3 concentration parameters $c_gc_rc_i$.
799: {\it Bottom row:} results for the NNP method, where the input data are
800: the 5 magnitudes and 5 concentration parameters.
801: In all cases, the photo-z methods
802: used a training set with $\sim 320,000$ objects, and the derived solutions were
803: applied to an independent validation set with $\sim 309,000$ objects and
804: $r < 22$, reflecting the magnitude limit of the photometric sample.
805: The solid line in each panel indicates $z_{\rm phot}=z_{\rm spec}$; the
806: dashed and dotted lines show the 68\% and 95\% confidence regions as a function
807: of $z_{\rm spec}$.
808: The points display results for a random $10\%$ subset of the validation set in
809: each magnitude range.
810: }
811: \label{zpzs_valid_all}
812: \end{figure*}
813:
814: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
815: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
816: \section{Results} \label{res}
817: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
818: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
819:
820: \subsection{Photometric redshifts}
821: \label{res-photoz}
822:
823: The photo-z precision (variance) and accuracy (bias) are
824: limited by a number of factors. There are
825: intrinsic degeneracies in
826: magnitude-redshift space: low-luminosity, intrinsically red galaxies at low
827: redshift can have apparent magnitudes similar to those of high-luminosity,
828: intrinsically blue galaxies at high redshift.
829: This natural degeneracy is amplified by
830: photometric errors, since magnitude uncertainties
831: propagate to photo-z errors.
832: In addition to these observational limitations, which are
833: determined by the photometric precision and the number of passbands of a survey,
834: the photo-z estimator itself may have inherent limitations. For example,
835: for training set methods, the size and representativeness of the training
836: set are important factors, as are the number of parameters or weights in
837: the fitting functions.
838:
839: To test the quality of the photo-z estimates,
840: we use four photo-z performance metrics.
841: The first two metrics are the photo-z bias, $z_{\rm bias}$, and the photo-z {\it rms}
842: scatter, $\sigma$, both averaged over all $N$ objects in the validation
843: set, defined by
844:
845: \begin{eqnarray}
846: z_{\rm bias}&=&\frac{1}{N}\sum_{i=1}^{N}\left( z_{\rm phot}^{i}-z_{\rm spec}^{i}\right) ~, \\
847: \sigma^2&=&\frac{1}{N}\sum_{i=1}^{N}\left(z_{\rm phot}^{i}-z_{\rm spec}^{i}\right)^2 ~.
848: \end{eqnarray}
849:
850: \noindent The third performance metric, denoted by $\sigma_{68}$, is
851: the range containing $68\%$ of the validation set objects in the distribution of
852: $\delta z = z_{\rm phot}-z_{\rm spec}$. This metric is useful because
853: the probability distribution function
854: $P(\delta z)$ is in general non-Gaussian and asymmetric (for a Gaussian
855: distribution, $\sigma$ and $\sigma_{68}$ coincide). Explicitly, $\sigma_{68}$ is
856: defined by the value of $|z_{\rm phot} - z_{\rm spec}|$ such that 68\% of the objects have $|z_{\rm phot} - z_{\rm spec}| < \sigma_{68}$.
857: We also use the $95\%$ region $\sigma_{95}$, defined similarly.
858: In addition to these global metrics, we also define local versions of them
859: in bins of redshift or magnitude.
860:
861: \begin{deluxetable}{llcc}
862: \tablewidth{0pt}
863: \tablecaption{Summary of ANN cases}
864: \startdata
865: \hline
866: \hline
867: \multicolumn{1}{c}{Case} & \multicolumn{1}{c}{Inputs/Description} & \multicolumn{1}{c}{$\sigma$} & \multicolumn{1}{c}{$\sigma_{68}$}\\
868: \hline
869: O1& $ugriz$ &0.0525 & 0.0229\\
870: C1& $ugriz$ + $c_uc_gc_rc_ic_z$ &0.0519 & 0.0224\\
871: D1& $ugriz$ + $c_uc_gc_rc_ic_z$. Split training&0.0519 & 0.0209\\
872: CC1&$u-g$, $g-r$, $r-i$, $i-z$ &0.0668 & 0.0272\\
873: CC2&$u-g$, $g-r$, $r-i$, $i-z$ + $c_gc_rc_i$ &0.0593 & 0.0245\\
874: \enddata
875: \label{table:method}
876: \tablecomments{Photo-z performance metrics $\sigma$ and $\sigma_{68}$
877: for the validation set using different input parameters
878: (magnitudes, colors, and concentration indices) and training procedures.}
879: \end{deluxetable}
880:
881: To search for an optimal photo-z estimator, we computed
882: photo-z's using the ANN method with
883: different combinations of input photometric observables. Five of
884: these combinations are listed in Table \ref{table:method}.
885: In the first case, dubbed O1, the training and photo-z estimation
886: are carried out using only the five magnitudes $ugriz$. In case C1,
887: we use the five magnitudes and the five concentration indices
888: $c_uc_gc_rc_ic_z$ as the input parameters. In case CC1, we
889: use only the four colors
890: $u-g$, $g-r$, $r-i$, and $i-z$. In case CC2, we combine the
891: four colors with
892: the concentration indices $c_gc_rc_i$ in the $gri$ filters.
893: Finally, in case D1, we use the $ugriz$ magnitudes
894: and the $c_uc_gc_rc_ic_z$ concentration indices, but we split the
895: training set and the photometric sample into 5 bins of $r$ magnitude and
896: perform separate ANN fits in each bin.
897: In all five cases, we use an ANN with three hidden layers and tune
898: the number of hidden nodes to keep the total
899: number of degrees of freedom of the network roughly the same for all cases.
900:
901:
902: Table~\ref{table:method} provides a summary of the performance results of the
903: different ANN cases.
904: We find that using concentration indices in addition to magnitudes
905: (C1 vs. O1) helps break some degeneracies and reduces the
906: photo-z scatter by a few percent.
907: Using only colors (CC1) degrades the photo-z performance by as much as 20\%,
908: mostly because the degeneracy between intrinsically red, nearby galaxies
909: and intrinsically blue, distant galaxies (with red observed colors)
910: cannot be broken.
911: Adding concentration indices to color-only training (CC2)
912: helps break such a degeneracy, because the concentration index correlates
913: with galaxy type and hence intrinsic color. Of the five,
914: case CC2 also yields the most realistic photometric redshift
915: distribution for the photometric sample (see \S \ref{subsec:red_dist}).
916: Finally, splitting the training set and photometric sample into
917: magnitude bins (D1) produces
918: results with the best performance metrics ($\sigma$ and $\sigma_{68}$) of
919: all the ANN cases we have tested.
920: We choose D1 and CC2 as the best ANN cases and describe their
921: results in more detail below; their outputs for the photometric sample
922: are included in the public DR6 database.
923:
924: In Fig.~\ref{zpzs_valid_all}, we plot photometric redshift, \zphot,
925: for all objects in the validation set vs. true
926: spectroscopic redshift, \zspec, for the different photo-z methods
927: and cases and in different ranges of $r$ magnitude.
928: The top row shows results for ANN case D1, the middle row shows
929: the performance of ANN case CC2, and the bottom row shows results for
930: the NNP method using magnitudes and concentration indices as the input
931: parameters. In each panel,
932: the values of the corresponding global
933: photo-z performance metrics $\sigma$ and $\sigma_{68}$ are shown.
934: The redshift bias $z_{\rm bias}$ is typically much smaller than $\sigma$ or
935: $\sigma_{68}$, since the photo-z methods are designed to minimize it (see
936: Fig. \ref{plot:statvsm}). In each panel of Fig. \ref{zpzs_valid_all},
937: the solid line traces
938: $z_{\rm phot}=z_{\rm spec}$, i.e., the line
939: for a perfect photo-z estimator.
940: The dashed and dotted lines show the corresponding $68\%$ and $95\%$ regions,
941: defined as above but in $z_{\rm spec}$ bins. Although
942: each photo-z method probes the
943: hypersurface defined by the photometric observables and redshift in a different
944: way,
945: they produce very similar results, suggesting that our results are
946: limited not by the photo-z technique employed but by the
947: intrinsic degeneracies in magnitude-concentration-redshift space and
948: by the photometric errors.
949:
950: \begin{figure}
951: \resizebox{85mm}{!}{\includegraphics[angle=0]{f5.eps}}
952: \caption{The performance metrics
953: $z_{\rm bias}$, $\sigma$, and $\sigma_{68}$ for the ANN D1 and CC2
954: validation sets are shown
955: as a function of $r$ magnitude.
956: CC2 performs relatively poorly for bright objects ($r < 16$), where the color-redshift
957: relation is contaminated by faint objects with similar colors. In D1,
958: this problem is alleviated by the effective magnitude prior imposed by
959: the training set. At faint magnitudes, the performance degrades as the photometric
960: errors increase.
961: }
962: \label{plot:statvsm}
963: \end{figure}
964:
965:
966: \begin{figure*}
967: \begin{center}
968: \begin{minipage}[t]{81mm}
969: \begin{center}
970: \resizebox{81mm}{!}{\includegraphics[angle=0]{f6a.c.eps}}
971: \end{center}
972: \end{minipage}
973: \begin{minipage}[t]{81mm}
974: \begin{center}
975: \resizebox{81mm}{!}{\includegraphics[angle=0]{f6b.c.eps}}
976: \end{center}
977: \end{minipage}
978: \end{center}
979: \caption{Performance metrics
980: $z_{\rm bias}$, $\sigma$, and $\sigma_{68}$ for the ANN D1 and CC2 validation sets
981: are shown as a function of $z_{\rm spec}$ for $r<20$ and $r>20$.
982: The increased scatter for objects with $z > 0.6$ is due to
983: the 4000 \AA \ break shifting out of the $r$ passband at
984: around $z = 0.7$; beyond that redshift, the estimator effectively relies
985: on only two passbands ($i$ and $z$) to determine the photo-z's. Note that
986: faint objects ($r > 20$) have worse scatter at low redshifts for
987: both cases. This is likely due to the fact that the faint, low-redshift
988: objects in the validation set are predominantly blue
989: dwarf or irregular galaxies that do not have
990: strong 4000 \AA \ breaks; in this case, the photo-z estimator must rely on less
991: pronounced spectral features, resulting in larger photo-z scatter.
992: }
993: \label{plot:statvsz}
994: \end{figure*}
995:
996: \begin{figure*}
997: \begin{center}
998: \begin{minipage}[t]{81mm}
999: \begin{center}
1000: \resizebox{81mm}{!}{\includegraphics[angle=0]{f7a.c.eps}}
1001: % BW \resizebox{81mm}{!}{\includegraphics[angle=0]{plots/grVSz.rl20.ps}}
1002: \end{center}
1003: \end{minipage}
1004: \begin{minipage}[t]{81mm}
1005: \begin{center}
1006: \resizebox{81mm}{!}{\includegraphics[angle=0]{f7b.c.eps}}
1007: % BW \resizebox{81mm}{!}{\includegraphics[angle=0]{plots/grVSz.rg20.ps}}
1008: \end{center}
1009: \end{minipage}
1010: \end{center}
1011: \caption{
1012: $g-r$ color vs spectroscopic redshift for galaxies in the
1013: validation set: {\it left panel:} galaxies with $r<20$; {\it right panel:}
1014: galaxies with $r>20$. The solid curves show expected color-redshift relations of
1015: galaxies with different SED types, calculated using the \cite{col80}
1016: spectral templates. The different
1017: colors (shades of grey)
1018: %BW: symbol and greyscale types
1019: indicate galaxies from the different spectroscopic surveys contributing
1020: to the validation set. The 2SLAQ objects, denoted by red triangles, were
1021: selected to be mostly early-type galaxies. They are
1022: responsible for the minimum in $\sigma$ vs. $z_{spec}$
1023: for the $r>20$ subsample in Fig. \ref{plot:statvsz}.
1024: }
1025: \label{plot:grvsz}
1026: \end{figure*}
1027:
1028:
1029: In Figs. \ref{plot:statvsm} and \ref{plot:statvsz}, we show the performance
1030: metrics
1031: $z_{\rm bias}$, $\sigma$, and $\sigma_{68}$ as a function of $r$ magnitude
1032: and $z_{\rm spec}$ for the validation set for the two preferred ANN cases.
1033: We see that the photo-z precision degrades considerably
1034: for objects with $r > 20$.
1035: This increased scatter is expected, since the relative photometric errors
1036: increase as the nominal detection limit of the SDSS photometry is approached
1037: (see Table \ref{propphot}). While the bias for CC2 increases at $r<17$,
1038: we note that the fraction of objects in the photometric sample which are
1039: that bright is very small.
1040: As a function of redshift, $\sigma$ and $\sigma_{68}$ increase dramatically
1041: beyond $z \sim 0.6$
1042: for the validation set.
1043: For the $r < 20$ part of the sample, the number of spectroscopic objects with
1044: $z > 0.6$ is simply too small
1045: to characterize the redshift-magnitude surface, as shown in
1046: the left panel of Fig. \ref{plot:grvsz}. For the
1047: faint objects ($r > 20$), the scatter is low for $z$ between 0.4 and
1048: 0.6 and increases outside of that range.
1049: It's important to note that the photo-z performance metrics were
1050: calculated independently of spectral type.
1051: Since the the neural network and the training set were not optimized
1052: for any specific galaxy population (e.g., galaxies in clusters) it is possible
1053: that certain galaxy types may have photo-z's with worse (or better!)
1054: biases and dispersion.
1055:
1056:
1057: In Figure~\ref{plot:grvsz}, we plot $g-r$ color versus spectroscopic
1058: redshift for the validation set for both bright ($r<20$) and faint ($r>20$) galaxies.
1059: The 2SLAQ and DEEP2 galaxies are highlighted by different
1060: colors (shades of grey),
1061: %BW: shades of grey,
1062: and the expected color-redshift relations for the four spectral templates from
1063: \cite{col80}
1064: (from early to late types) are indicated by the solid lines.
1065: We see that for the faint sample, in the range $0.4 < z < 0.6$, the galaxies
1066: come mostly from the 2SLAQ survey, which used
1067: specific color cuts to select early-type galaxies at
1068: $z\sim0.5$. Because early-type galaxies have a well-defined
1069: 4000 \AA \ break feature, their photo-z's are well determined and
1070: their photo-z scatter is low.
1071: Outside of the range $0.4 < z < 0.6$, the validation set at faint magnitudes
1072: is dominated by bluer galaxies
1073: that do not have strong, broad spectral features, resulting in the
1074: larger photo-z scatter seen in Fig. \ref{plot:statvsz}.
1075:
1076: Fig.~\ref{plot:statvsz} shows that the common assumption that the
1077: photo-z scatter
1078: scales as $(1+z)$ is not consistent with our estimates for the SDSS sample.
1079: The functional form of the scatter versus redshift depends
1080: strongly on the underlying galaxy type distribution.
1081:
1082: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1083: \subsection{Redshift Distributions}
1084: \label{subsec:red_dist}
1085: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1086:
1087: So far, we have considered the scatter and bias of photo-z estimates.
1088: As discussed in \S \ref{estdist}, it is also of interest to consider
1089: the predicted photo-z distribution as a whole. Different photo-z estimators
1090: may achieve similar values for the metrics $z_{\rm bias}$, $\sigma$, and $\sigma_{68}$,
1091: but predict different forms for the photo-z distribution of the photometric
1092: sample. As we shall see, this is the case with the two ANN cases D1 and CC2.
1093: We therefore define two additional performance metrics to quantify the
1094: quality of the predicted photo-z distribution.
1095: The first metric, $\sigma_{\rm dist}$, measures the {\it rms} difference between
1096: the binned $z_{\rm phot}$ and $z_{\rm spec}$ distributions of the validation set,
1097:
1098: \begin{eqnarray}
1099: \sigma^2_{\rm dist}&=&\frac{1}{N_{\rm bin}}\sum_{i=1}^{N_{\rm bin}}\left(P_{\rm phot}^{i}-P_{\rm spec}^{i}\right)^2,
1100: \end{eqnarray}
1101:
1102: \noindent where $P_{\rm phot}^{i}$ is the height of the
1103: $i^{\rm th}$ redshift bin of the $z_{\rm phot}$ distribution,
1104: $P_{\rm spec}^{i}$ is the height of the same redshift
1105: bin of the $z_{\rm spec}$ distribution, and $N_{\rm bin}$ is the total number
1106: of redshift bins used.
1107: Here we use $N_{\rm bin}=120$ equally spaced redshift bins running
1108: from $z=0$ to $z=1.2$.
1109:
1110:
1111: The second redshift distribution
1112: metric we employ is the KS statistic $D$, the
1113: maximum value of the absolute difference between the two ($z_{\rm phot}$ and
1114: $z_{\rm spec}$) cumulative
1115: redshift distribution
1116: functions. An advantage of the KS statistic is that it does not require
1117: binning the data in redshift. However, our
1118: use of the KS statistic to quantify the difference between the $z_{\rm phot}$
1119: and $z_{\rm spec}$ distributions of the validation set likely does
1120: not adhere to formal statistical practice,
1121: since it turn outs that the probability for the KS statistic for both cases we consider
1122: is very close to zero \citep{pre92}.
1123:
1124: Table \ref{table_sigdist_ks} shows the values of
1125: $\sigma_{\rm dist}$ and of the KS statistic $D$ for the validation set for the
1126: D1 and CC2 ANN photo-z's, for different ranges of $r$ magnitude.
1127: Although the CC2 photo-z distribution is
1128: a worse overall match to the $z_{\rm spec}$ distribution for the
1129: validation set, it works better than D1 for $r>18$.
1130: Since the photometric sample
1131: is dominated by objects at $r>20$ (see Fig. \ref{dist.sdss}),
1132: these results suggest that CC2 should do a better job in
1133: estimating the redshift distribution of the photometric sample,
1134: even though D1 performs better by the standards of $z_{\rm bias}$ and
1135: $\sigma$.
1136:
1137:
1138: \begin{deluxetable}{cc|cc|ccc}
1139: \tablewidth{0pt}
1140: \tablecaption{$\sigma_{\rm dist}$ and KS statistic for Redshift distribution}
1141: \startdata
1142: \hline
1143: \hline
1144: \multicolumn{1}{c}{} & \multicolumn{1}{c|}{} & \multicolumn{2}{c|}{$\sigma_{\rm dist}$} & \multicolumn{2}{c}{KS statistic}\\
1145: \hline
1146: \multicolumn{1}{c}{} & \multicolumn{1}{c|}{$r$-mag bin} & \multicolumn{1}{c}{CC2} & \multicolumn{1}{c|}{D1} & \multicolumn{1}{c}{CC2} & \multicolumn{1}{c}{D1}\\
1147: \hline
1148: &$r < 18$ & 0.0392 & 0.0330 & 0.0632 & 0.0391& \\
1149: &$18<r<19$& 0.0390 & 0.0430 & 0.0520 & 0.0533& \\
1150: &$19<r<20$& 0.0391 & 0.0399 & 0.0366 & 0.0413&\\
1151: &$20<r<21$& 0.0403 & 0.0471 & 0.0363 & 0.0665&\\
1152: &$21<r<22$& 0.0652 & 0.0702 & 0.1051 & 0.1306&\\
1153: \hline
1154: &All & 0.0383 & 0.0338 & 0.0485 & 0.0307&
1155: \enddata
1156: \label{table_sigdist_ks}
1157: \tablecomments{$\sigma_{\rm dist}$ and KS statistic results for CC2 and D1 ANN photo-z's for the validation set.}
1158: \end{deluxetable}
1159:
1160:
1161: The redshift distributions for the validation set are shown in
1162: Fig.~\ref{dndz.valid} for the same bins of $r$ magnitude as in
1163: Table \ref{table_sigdist_ks}.
1164: The D1 and CC2 \zphot \ distributions are shown
1165: in color,
1166: % BW: shaded,
1167: and the solid curves correspond to the \zspec \ distributions.
1168: The similarities between the \zphot \ and \zspec \ distributions
1169: are consistent with the results of
1170: Table \ref{table_sigdist_ks}.
1171:
1172: In \S \ref{estdist}, we noted that the \zspec \ distribution of the
1173: spectroscopic sample, weighted to reproduce the color and magnitude
1174: distributions of the photometric sample, provides an estimate of the
1175: unknown redshift distribution of the photometric sample. The \zphot \
1176: distribution for the photometric sample, computed using ANN D1 or CC2, provides
1177: another estimate of the true redshift distribution for the photometric
1178: sample, but one that we know suffers from bias (e.g., Fig. \ref{plot:statvsm}).
1179: While we have not shown that the weighted \zspec \ estimate of the
1180: redshift distribution is unbiased, it has the advantage that it makes
1181: direct use of the statistical properties of the photometric sample, and
1182: we believe it is our best estimate of the photometric sample redshift distribution.
1183: Our final test of photo-z performance therefore compares the \zphot
1184: \ distribution for the photometric sample for the two ANN cases
1185: with the weighted \zspec \ distribution of the spectroscopic sample.
1186: Agreement between the weighted \zspec \ distribution and either one of the
1187: \zphot \ distributions does not guarantee that they are correct, but
1188: it at least provides a useful consistency check.
1189:
1190: In Fig.~\ref{dndz.photo} we show the estimated redshift distributions of a
1191: random subsample containing $\sim 1\%$ of the objects in the DR6
1192: photometric sample for both the CC2 and D1 ANN cases.
1193: The
1194: % BW: filled curves
1195: colored regions
1196: correspond to the \zphot \ distributions, and the solid lines indicate
1197: the weighted \zspec \ distribution of the spectroscopic sample.
1198: The \zphot \ distributions for CC2 are closer matches to
1199: the weighted \zspec \ distributions for $r>18$, and they do
1200: not show the peculiar features that the D1 photo-z distributions
1201: display, particularly at faint magnitudes. By the criterion of
1202: producing a more realistic redshift distribution for the photometric
1203: sample, the CC2 ANN estimator is preferred.
1204:
1205: \subsection{Photo-z Errors}
1206:
1207: In order to test the quality of our photo-z error estimates
1208: calculated with the NNE method, we introduce the concept of
1209: empirical error. For a set of objects (within the validation set) with similar
1210: NNE error,
1211: $\sigma_{z}^{\rm NNE}$, the empirical error is defined as the $68\%$
1212: width of the $|z_{\rm phot}-z_{\rm spec}|$ distribution for the set.
1213: If the NNE estimator works properly,
1214: objects with similar NNE error should have similar underlying
1215: error distributions, i.e.,
1216: the NNE error should correlate
1217: well with the empirical error.
1218:
1219: Fig.~\ref{erer} shows the performance of the photo-z error estimator
1220: by plotting the computed NNE error $\sigma_{z}^{\rm NNE}$ as a function
1221: of the corresponding empirical error for the validation set.
1222: Results are shown for the D1 and CC2 ANN photo-z's.
1223: The empirical error was calculated for bins containing $100$ objects
1224: with similar $\sigma_z^{\rm NNE}$.
1225: As expected, faint objects ($r > 20$) have larger errors than bright
1226: objects ($r < 20$).
1227: The NNE estimated error correlates well with the
1228: empirical error even for the faint objects, indicating that the
1229: error estimator works properly for all magnitudes.
1230: The bulk of the bright objects have $\sigma_z^{\rm NNE}$ in the range
1231: $0.01-0.04$, consistent with the overall {\it rms} photo-z scatter of
1232: $\sigma \sim 0.03$ indicated in Fig \ref{zpzs_valid_all}.
1233: Likewise, faint objects have $\sigma_z^{\rm NNE}$ in the range $0.02-0.3$,
1234: while $\sigma \sim 0.13$ for those objects.
1235: The NNE error is therefore a robust indicator of an object's
1236: photo-z quality. In particular, we have carried out tests in which we
1237: cut objects with large NNE error from the sample and found that the
1238: remaining sample has smaller photo-z scatter and fewer catastrophic
1239: outliers. For applications in which
1240: photo-z precision is more important than
1241: completeness of the photometric sample, this can be a
1242: useful procedure.
1243:
1244: \begin{figure*}
1245: \begin{center}
1246: \begin{minipage}[t]{81mm}
1247: \begin{center}
1248: \resizebox{81mm}{!}{\includegraphics[angle=0]{f8a.c.eps}}
1249: \end{center}
1250: \end{minipage}
1251: \begin{minipage}[t]{81mm}
1252: \begin{center}
1253: \resizebox{81mm}{!}{\includegraphics[angle=0]{f8b.c.eps}}
1254: \end{center}
1255: \end{minipage}
1256: \end{center}
1257: \caption{Redshift distributions for the galaxies in the
1258: validation set for different $r$ magnitude bins. {\it Left panels:} ANN D1;
1259: {\it right panels:} ANN CC2.
1260: The
1261: % BW: solidly
1262: colored regions indicate the ANN
1263: photo-z distributions, while the lines are
1264: the spectroscopic redshift distributions. By eye,
1265: both ANN cases recover the true redshift distributions of the
1266: validation set well, except
1267: in the faintest magnitude bin, where the photometric errors become large.
1268: }\label{dndz.valid}
1269: \end{figure*}
1270:
1271: \begin{figure*}
1272: \begin{center}
1273: \begin{minipage}[t]{81mm}
1274: \begin{center}
1275: \resizebox{81mm}{!}{\includegraphics[angle=0]{f9a.c.eps}}
1276: \end{center}
1277: \end{minipage}
1278: \begin{minipage}[t]{81mm}
1279: \begin{center}
1280: \resizebox{81mm}{!}{\includegraphics[angle=0]{f9b.c.eps}}
1281: \end{center}
1282: \end{minipage}
1283: \end{center}
1284: \caption{Estimated redshift distributions for a random subsample of
1285: 1\% of the galaxies in the
1286: DR6 photometric sample in different $r$-magnitude bins. {\it Left panels:}
1287: ANN D1; {\it right panels:} ANN CC2. Colors show the \zphot \ distributions.
1288: The lines show the estimated redshift distributions from the spectroscopic
1289: sample weighted to match the magnitude and color distributions of the
1290: photometric sample.
1291: Even though the two ANN cases correctly recover the
1292: validation set redshift distribution (Fig. \ref{dndz.valid}),
1293: their photo-z
1294: distributions for the photometric sample disagree. The photo-z distribution
1295: for D1 shows a peak at
1296: $z\sim0.4$ that results mainly from the $20 < r < 21$ bin.
1297: The CC2 distribution does not show such strong features, and in general it matches
1298: the weighted \zspec \ distribution better.
1299: }\label{dndz.photo}
1300: \end{figure*}
1301:
1302:
1303:
1304: \begin{figure*}
1305: \begin{center}
1306: \begin{minipage}[t]{81mm}
1307: \begin{center}
1308: \resizebox{81mm}{!}{\includegraphics[angle=0]{f10a.c.eps}}
1309: \end{center}
1310: \end{minipage}
1311: \begin{minipage}[t]{81mm}
1312: \begin{center}
1313: \resizebox{81mm}{!}{\includegraphics[angle=0]{f10b.c.eps}}
1314: \end{center}
1315: \end{minipage}
1316: \end{center}
1317: \caption{The estimated error from the NNE method, $\sigma_z^{\rm NNE}$, is
1318: shown against the empirical error for objects in the validation set.
1319: {\it Left panel:} D1 ANN; {\it right panel:} CC2 ANN.
1320: Each point corresponds to a bin
1321: of $100$ objects with similar $\sigma_z^{\rm NNE}$.
1322: The black squares show results for bright objects ($r < 20$),
1323: the red triangles for faint objects ($r > 20$). As expected, faint
1324: objects have larger errors, but
1325: the NNE error correlates well with the empirical error over the full magnitude range.
1326: }\label{erer}
1327: \end{figure*}
1328:
1329:
1330:
1331: In Fig.~\ref{gausser}, we plot the normalized error distribution,
1332: i.e., the distribution
1333: of $(z_{\rm phot}-z_{\rm spec})/\sigma_{z}^{\rm NNE}$, for objects
1334: in the spectroscopic sample, using the D1 ANN estimator.
1335: The solid black lines are the data, and the dotted red lines
1336: show Gaussian distributions with zero mean and unit variance.
1337: The upper panels show results for the galaxies in the SDSS Main
1338: and LRG spectroscopic samples. The lower panels show results for
1339: all validation-set galaxies, divided into bright
1340: ($r < 20$) and faint ($r > 20$) samples.
1341: These plots indicate that, averaged over the bulk of the spectroscopic
1342: sample, the photo-z estimates are nearly unbiased, the NNE error
1343: provides a good estimate of the true error, and the NNE error can be
1344: approximately interpreted as a Gaussian error in this average sense.
1345: Note that this does {\it not} imply that the photo-z error distributions in
1346: bins of magnitude or redshift are unbiased Gaussians: Figs. \ref{plot:statvsm}
1347: and \ref{plot:statvsz} show that they are not.
1348:
1349: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1350: \section{Query Flags and Caveats} \label{rec}
1351: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1352:
1353: When querying the SDSS data server to produce the photometric sample for
1354: which we estimated photo-z's, we set the most relevant flags needed to
1355: produce a clean galaxy sample.
1356: However, some applications may require more stringent selection of objects.
1357: We advise users of the catalog to read the documentation about producing a clean
1358: galaxy sample on the SDSS
1359: website\footnote{ {\tt http://cas.sdss.org/dr6/en/help/docs/algorithm.asp} }.
1360: In particular, users should consider requiring the BINNED1 (object detected at $> 5\sigma$) flag and removing
1361: objects with the NODEBLEND (object is a blend but deblending was not possible) flag. The various PHOTO flags
1362: are described in more details at the above
1363: website as well as in Appendix \ref{query}.
1364:
1365: Finally, we note that the training of the photo-z estimators included only
1366: galaxies, not stars. As a result, photo-z estimates for
1367: stars that contaminate the photometric sample will be wrong, and cutting
1368: objects with low $z_{\rm phot}$ will not remove them. Our tests on
1369: star/galaxy separation in the photometric sample are briefly
1370: described in Appendix \ref{stargal}.
1371:
1372: \begin{figure}
1373: \begin{center}
1374: \begin{minipage}[t]{81mm}
1375: \begin{center}
1376: \resizebox{81mm}{!}{\includegraphics[angle=0]{f11.c.eps}}
1377: \end{center}
1378: \end{minipage}
1379: \end{center}
1380: \caption{
1381: Distributions of
1382: $(z_{\rm phot}-z_{\rm spec})/\sigma_{z}^{\rm NNE}$
1383: for objects in the spectroscopic sample, with photo-z's calculated
1384: using ANN D1; the
1385: results for ANN CC2 are very similar.
1386: The solid black lines are the data, and the dotted red lines are
1387: Gaussians with zero mean and unit variance. {\it Top left:} SDSS Main
1388: spectroscopic sample; {\it top right:} SDSS LRG sample; {\it bottom
1389: left:} validation-set galaxies with $r<20$; {\it bottom right:} validation-set
1390: galaxies with $r>20$. In all cases the photo-z errors
1391: are reasonably well modeled by Gaussian distributions.
1392: }\label{gausser}
1393: \end{figure}
1394:
1395:
1396: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1397: \section{Accessing the Catalog} \label{cat}
1398: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1399:
1400: The photo-z catalog can be accessed from the
1401: {\tt photoz2} table in the DR6 context on the
1402: SDSS CasJobs site, at {\tt http://casjobs.sdss.org/casjobs/}.
1403: A query similar to the one in the Appendix provides all objects
1404: for which we computed photo-z's.
1405: Alternatively, one can simply perform a query that searches for
1406: objects with a {\tt photoz2} entry.
1407:
1408: In addition to the {\tt photoz2} table in the SDSS CAS, an independent
1409: {\tt photoz} table is also available, for which the photo-z's
1410: have been computed using a template-based technique; see
1411: \cite{csa07, ade07}.
1412:
1413:
1414: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1415: \section{Conclusions}\label{con}
1416: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1417: We have presented a public catalog of photometric redshifts for the SDSS DR6
1418: photometric sample using
1419: two different photo-z estimates, CC2 and D1, based on the ANN method.
1420: As a consistency check, we have also calculated photo-z's using the NNP method,
1421: a nearest neighbor approach, which gives very good agreement with
1422: the ANN results.
1423: The CC2 and D1 photo-z results are comparable. For the validation set, the
1424: D1 photo-z estimates have lower photo-z scatter for bright galaxies ($r<20$),
1425: and scatter similar to but slightly smaller than that of
1426: CC2 for objects with $r>20$. Our tests indicate
1427: that the SDSS photo-z estimates are most reliable for galaxies
1428: with $r<20$
1429: and that the scatter increases significantly at fainter magnitudes.
1430: For faint galaxies ($r>20$), we recommend using the CC2 photo-z estimate,
1431: since the CC2 \zphot \ distribution most closely resembles the \zspec \
1432: distribution for the validation set and the weighted \zspec \ estimate
1433: for the redshift distribution of the photometric sample.
1434: For users who wish to use, for simplicity, a single photo-z estimator
1435: over the full
1436: magnitude range, we recommend using CC2.
1437:
1438: Finally, we have demonstrated that the NNE error estimator, included in the
1439: public catalog,
1440: provides a reliable measure of the photo-z errors and that the overall scaled
1441: photo-z errors are nearly Gaussian.
1442:
1443: Funding for the DEEP2 survey has been provided by NSF grant AST-0071048 and AST-0071198. The data presented herein were obtained at the W.M. Keck Observatory, which is operated as a scientific partnership among the California Institute of Technology, the University of California and the National Aeronautics and Space Administration. The Observatory was made possible by the generous financial support of the W.M. Keck Foundation. The DEEP2 team and Keck Observatory acknowledge the very significant cultural role and reverence that the summit of Mauna Kea has always had within the indigenous Hawaiian community and appreciate the opportunity to conduct observations from this mountain.
1444:
1445: Funding for the SDSS and SDSS-II has been provided by the Alfred P. Sloan Foundation, the Participating Institutions, the National Science Foundation, the U.S. Department of Energy, the National Aeronautics and Space Administration, the Japanese Monbukagakusho, the Max Planck Society, and the Higher Education Funding Council for England. The SDSS Web Site is {\tt http://www.sdss.org/}.
1446:
1447: The SDSS is managed by the Astrophysical Research Consortium for the Participating Institutions. The Participating Institutions are the American Museum of Natural History, Astrophysical Institute Potsdam, University of Basel, University of Cambridge, Case Western Reserve University, University of Chicago, Drexel University, Fermilab, the Institute for Advanced Study, the Japan Participation Group, Johns Hopkins University, the Joint Institute for Nuclear Astrophysics, the Kavli Institute for Particle Astrophysics and Cosmology, the Korean Scientist Group, the Chinese Academy of Sciences (LAMOST), Los Alamos National Laboratory, the Max-Planck-Institute for Astronomy (MPIA), the Max-Planck-Institute for Astrophysics (MPA), New Mexico State University, Ohio State University, University of Pittsburgh, University of Portsmouth, Princeton University, the United States Naval Observatory, and the University of Washington.
1448:
1449: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1450: \appendix
1451: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1452:
1453: \section{Data Query Code}\label{query}
1454:
1455: Here we provide the SDSS database query used to obtain part of the catalog containing
1456: the photometric sample used in this paper.
1457: Notice that the query requires the TYPE flag to be set to 3 (galaxies) and
1458: selects objects with dereddened model magnitude $r<22.0$ to reflect
1459: the SDSS nominal detection limit.
1460: The query to obtain objects with Right Ascension (RA) in the
1461: range $[0,170)$ is
1462:
1463: \vspace{0.8 cm}
1464:
1465: {\tt
1466: declare @BRIGHT bigint set @BRIGHT=dbo.fPhotoFlags('BRIGHT')
1467:
1468: declare @SATURATED bigint set @SATURATED=dbo.fPhotoFlags('SATURATED')
1469:
1470: declare @SATUR\_CENTER bigint set @SATUR\_CENTER=dbo.fPhotoFlags('SATUR\_CENTER')
1471: \vspace{0.5 cm}
1472:
1473: declare @bad\_flags bigint set @bad\_flags=(@SATURATED|@SATUR\_CENTER|@BRIGHT)
1474: \vspace{0.5 cm}
1475:
1476: select
1477:
1478: objID, ra, dec,type,dered\_u,dered\_g,dered\_r,dered\_i,dered\_z,
1479:
1480: petroR50\_u, petroR50\_g, petroR50\_r, petroR50\_i, petroR50\_z,
1481:
1482: petroR90\_u, petroR90\_g, petroR90\_r, petroR90\_i, petroR90\_z
1483:
1484:
1485:
1486: \vspace{0.5 cm}
1487:
1488:
1489: into MyDb.all\_ra\_0\_170
1490:
1491: FROM PhotoPrimary
1492:
1493: WHERE ((flags \& @bad\_flags)) = 0 AND (dered\_r<=22.0) AND (ra>=0.0) AND (ra<170.0)
1494:
1495: AND (type = 3)
1496:
1497: }
1498:
1499: \vspace{0.5cm}
1500:
1501: Here we provide a brief description of the flags used in the query:
1502: BRIGHT indicates that an object is a duplicate detection of an object with
1503: signal to noise greater
1504: than $200 \sigma$; SATURATED indicates that an
1505: object contains one or more saturated pixels;
1506: SATUR\_CENTER indicates that the object center is close to at least one
1507: saturated pixel.
1508: Note that in selecting PRIMARY objects (using PhotoPrimary),
1509: we have implicitly selected objects
1510: that either do {\it not} have the BLENDED flag set
1511: or else have NODEBLEND set or nchild equal zero.
1512: In addition, the PRIMARY catalog contains no BRIGHT objects, so
1513: the cut on BRIGHT objects in the query above is in fact redundant.
1514: BLENDED objects have multiple peaks detected within them, which PHOTO
1515: attempts to deblend into several CHILD objects.
1516: NODEBLEND objects are BLENDED but no deblending was attempted on them, because
1517: they are either too close to an EDGE, or too large, or one of
1518: their children overlaps an edge. A few percent of the objects in
1519: our photometric sample have NODEBLEND set; some users may wish to
1520: remove them.
1521:
1522: We also suggest that users require objects to have the
1523: BINNED1 flag set.
1524: BINNED1 objects were detected at $\geq 5 \sigma$ significance
1525: in the original imaging frame.
1526:
1527: The SDSS webpage\footnote{\tt{http://cas.sdss.org/dr5/en/help/docs/algorithm.asp?key=flags}} provides
1528: further recommendations about flags, which we strongly recommend that users read.
1529:
1530: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1531: \section{Tests on star-galaxy separation}\label{stargal}
1532: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1533:
1534: We used the SDSS database TYPE flag to select the galaxy
1535: photometric sample for our photo-z catalogs. To study the robustness
1536: of the TYPE flag in separating galaxies from stars, we also
1537: carried out tests using an independent star-galaxy
1538: classifier.
1539: Here we briefly describe both of these techniques and show the results
1540: obtained on photometric and spectroscopic samples.
1541:
1542: The TYPE flag is based on the star-galaxy separator in the SDSS PHOTO
1543: pipeline,
1544: described in \cite{lup01} and updated in \cite{aba04}.
1545: For a given object, the pipeline computes the PSF and cmodel
1546: magnitudes in each passband\footnote{http://www.sdss.org/dr5/algorithms/photometry.html},
1547: where the cmodel magnitude is a measure of the flux using a
1548: composite of the best-fit de Vaucouleurs and exponential models of
1549: the light profile. If the condition
1550:
1551: \begin{equation}
1552: m_{PSF}-m_{cmodel} > 0.145
1553: \end{equation}
1554:
1555: \noindent is satisfied, type is set to GALAXY for that band;
1556: otherwise, type is set to STAR. The object's global TYPE is
1557: determined by the same criterion, but now applied to the
1558: summed PSF and cmodel fluxes from all passbands in
1559: which the object is detected.
1560: \cite{lup01} show that an earlier version of this simple
1561: cut works at the $95\%$ confidence level for SDSS objects brighter
1562: than $r=21$.
1563:
1564: The second star-galaxy separator we tested is the galaxy probability
1565: defined in \cite{scr02}.
1566: The galaxy probability (hereafter $probgals$) is a Bayesian probability estimate that an object
1567: is a galaxy (and not a star), given the object's magnitudes and
1568: concentration parameter. Here
1569: the concentration parameter is {\it not}
1570: the ratio of Petrosian radii but is
1571: defined as the difference between an
1572: object's PSF and exponential-model $r$ magnitudes.
1573: This concentration parameter is close to zero for stars, is positive
1574: for bright galaxies, and approaches zero as galaxies become fainter.
1575:
1576: We conducted some simple tests to compare these classification schemes.
1577: If we set the Bayesian $probgals$ threshold to a value between 0.5 and 0.9,
1578: then both methods agree on the classification of
1579: more than $90\%$ of the objects for
1580: a random 1\% subset of the SDSS photometric sample.
1581: We also tested the methods on a spectroscopic sample of 29,229
1582: galaxies and stars (counting independent photometric
1583: measurements of each object) from the 2SLAQ and DEEP2 catalogs
1584: with $r < 22$.
1585: Defining stars as objects with $z_{\rm spec}<0.01$, the sample
1586: contains 24,541 galaxies and 4,688 stars. We wish to compare
1587: this spectroscopic ``truth table'' with the photometric classification
1588: of the two methods and with a combined method that classifies
1589: an object as a galaxy if and only if both separators classify it as a
1590: galaxy.
1591: For the purposes of this test, we say that
1592: the Bayesian scheme classifies an object as a galaxy if
1593: $probgals>0.5$. We define galaxy
1594: completeness as the ratio of correctly identified
1595: galaxies to the total number of galaxies in the spectroscopic
1596: sample.
1597: Purity is defined as the ratio of correctly identified galaxies
1598: to the number of objects identified (correctly or not) as galaxies by the
1599: classifier. The purity depends in part on the relative numbers of
1600: galaxies and stars in the spectroscopic sample.
1601:
1602:
1603: Fig.~\ref{compur} shows the completeness and purity of the
1604: resulting galaxy catalogs in bins of $r$
1605: magnitude for this spectroscopic sample.
1606: Overall, the Bayesian separator and PHOTO TYPE
1607: produce similar results for galaxy purity and completeness. Moreover,
1608: the agreement between the two classification methods is quite good on
1609: an object-by-object basis.
1610: The
1611: Bayesian separator with {\it probgals} $\geq 0.5$ achieves slightly higher
1612: completeness and slightly lower purity.
1613: By varying the $probgals$ boundary, we could improve the purity of the
1614: Bayesian galaxy sample at the expense of degrading its completeness.
1615: We note that
1616: the best value of $probgals$ to use in defining a galaxy photometric
1617: sample depends on the scientific applications of the sample, i.e.,
1618: on whether completeness or purity is the more important feature.
1619: In statistical applications, instead of defining a galaxy sample one
1620: can also choose to weight objects by
1621: their Bayesian probability \citep{scr02}.
1622:
1623: Based on this test, we conclude that
1624: the photometric sample for which we have estimated photo-z's has
1625: better than 90\% galaxy purity.
1626:
1627: \begin{figure}
1628: \begin{center}
1629: \begin{minipage}[t]{81mm}
1630: \begin{center}
1631: \resizebox{81mm}{!}{\includegraphics[angle=0]{f12.c.eps}}
1632: \end{center}
1633: \end{minipage}
1634: \end{center}
1635: \caption{{\it Top panel:} completeness and {\it bottom panel:} purity
1636: for the
1637: Bayesian and PHOTO TYPE galaxy classifications as well as for a combination
1638: of the two, using a sample of galaxies with spectroscopic classification.
1639: Results for the Bayesian separator have the $probgals$ lower bound set
1640: to $0.5$.}
1641: \label{compur}
1642: \end{figure}
1643:
1644:
1645: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1646: \section{Photometric Redshifts for SDSS DR5}
1647: \label{photdr5}
1648: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1649:
1650: An earlier version of the photo-z catalog, produced for SDSS
1651: Data Release 5 (DR5), is publicly
1652: available on the SDSS DR5 website (and is also called
1653: {\tt photoz2}).
1654: The methods used to construct that photo-z catalog were similar to the
1655: ones employed here for DR6, but the latter incorporates a number of
1656: important
1657: improvements. Here we briefly outline the differences between the two.
1658: We {\it strongly} recommend use of the DR6 photo-z catalog instead of the
1659: DR5 catalog.
1660:
1661: The photometric galaxy sample selection has improved from DR5 to DR6,
1662: because we used more stringent cuts in defining the DR6 sample.
1663: The DR6 sample selection is described above in Appendix \ref{query}.
1664: The DR5 photometric galaxy sample selection required
1665: the cmodel and model $r$ magnitudes to lie in the ranges
1666: $r_{\rm cmodel} \in (14.0,22.0)$ and
1667: $r_{\rm model} \in (13.5,22.5)$, and also required the value of
1668: the smear polarizability \citep{she04} to be $m_r>0.8$. Also, for DR5,
1669: star-galaxy separation used the Bayesian estimator (see
1670: Appendix \ref{stargal}) with the value $probgals >0.8$, while for DR6
1671: we used PHOTO TYPE.
1672: The additional cuts used for the DR6 catalog have produced
1673: a cleaner and more reliable galaxy sample.
1674:
1675: \begin{deluxetable}{cccc}
1676: \tablewidth{0pt}
1677: \tablecaption{DR5 Catalog $flag$}
1678: \startdata
1679: \hline
1680: \hline
1681: \multicolumn{1}{c}{$flag$}
1682: & \multicolumn{1}{c}{N\textsuperscript{\b{o}} of Galaxies}
1683: & \multicolumn{1}{c}{Object Description}\\
1684: \hline
1685: - & $86.1$ million &All \\
1686: 0 & $12.6$ million &\hspace{0.075 in}Complete \& bright\\
1687: 1 & $\hspace{0.06 in}0.6$ million &Incomplete \& bright\\
1688: 2 & $59.0$ million &Complete \& faint \\
1689: 3 & $13.9$ million &\hspace{-0.075 in}Incomplete \& faint \\
1690: \enddata
1691: \label{tableflags}
1692: \tablecomments{The flag scheme for the DR5 catalog is based on object
1693: detection in some/all passbands and the $r$ magnitude. Incomplete objects are undetected in
1694: at least one of the passbands ($ugriz$) and faint objects have $r>20$.
1695: }
1696: \end{deluxetable}
1697:
1698: \begin{deluxetable}{cccc}
1699: \tablewidth{0pt}
1700: \tablecaption{DR6 Catalog $flag$}
1701: \startdata
1702: \hline
1703: \hline
1704: \multicolumn{1}{c}{$flag$}
1705: & \multicolumn{1}{c}{N\textsuperscript{\b{o}} of Galaxies}
1706: & \multicolumn{1}{c}{Object Description}\\
1707: \hline
1708: - & $77.4$ million &All \\
1709: 0 & $11.5$ million &bright\\
1710: 2 & $65.9$ million &faint \\
1711: \enddata
1712: \label{tableflags6}
1713: \tablecomments{The $flag$ scheme for the DR6 catalog is based solely on the
1714: on the $r$ magnitude: faint objects have $r>20$.
1715: }
1716: \end{deluxetable}
1717:
1718: The DR5 photo-z catalog included
1719: a number of flags describing the expected photo-z
1720: quality, shown in Table \ref{tableflags}.
1721: These flags were based on the detection or non-detection of the object in
1722: all passbands
1723: and on the value of the $r$ model magnitude. An object was classified
1724: as bright (faint) if $r<20$ ($r>20$). An object was flagged as ``incomplete''
1725: if it was not detected in all five SDSS passbands. Table \ref{tableflags}
1726: shows the corresponding flag values and the number of objects assigned
1727: each flag value. For the DR6 sample, given the stricter
1728: sample selection, a very small number of objects would have been
1729: classified as incomplete by the definition above, and they have
1730: been removed from the sample. As a result, for DR6, we only
1731: supply the bright/faint flag, as shown in Table \ref{tableflags6}.
1732:
1733: The spectroscopic training set used for the DR6 photo-z catalog
1734: has important additions compared to
1735: the one used for the DR5 catalog. In particular,
1736: for DR6 we added the DEEP2 spectroscopic catalog (which became
1737: publicly available), which made the training set more complete
1738: at faint magnitudes.
1739: We also implemented more stringent spectroscopic quality cuts
1740: to the training set used for DR6.
1741:
1742: Unlike the DR5 training set, the DR6 training set does not contain
1743: objects from the SDSS ``special'' plates, extra spectroscopic observations
1744: designed to target specific objects for various scientific studies \citep{ade06}.
1745: In our tests, we find that
1746: the lack of special plates does not result in any degradation of the
1747: photo-z quality.
1748:
1749: The photo-z algorithm also changed from DR5 to DR6: we increased the
1750: number of hidden-layer nodes in the ANN and we added the concentration
1751: indices to the data inputs.
1752: Our tests indicated that this leads to
1753: improved photo-z performance according to our metrics.
1754: In addition, the CC2 method differs from DR5 photo-z's further in that
1755: CC2 uses only the color information and not the raw magnitudes.
1756: For general purpose, full sample photo-z's, we recommend using CC2
1757: photo-z's over both DR5 and D1 photo-z's.
1758: Finally,
1759: we have carried out more extensive tests of the DR6 photo-z's than
1760: were done for DR5, increasing our confidence in the robustness of
1761: the photo-z estimates.
1762:
1763: \bibliographystyle{apj}
1764: \bibliography{ms}
1765:
1766: \end{document}
1767: