0708:0708.0030/ms.tex

1: % $Id: sdss.tex,v 1.104 2007/07/28 02:07:43 oyachai Exp $

2:

3: %\documentclass[12pt,preprint]{aastex}

4: \documentclass[11pt,preprint]{emulateapj}

5: \usepackage{graphicx,natbib}

6: \usepackage{url}

7: \usepackage{color}

8: %\usepackage{amssymb}

9:

10: \citestyle{aa}

11:

12: \newcommand{\zphot}{$z_{\rm phot}$}

13: \newcommand{\zspec}{$z_{\rm spec}$}

14: \newcommand{\sigmoid}{{\rm s}}

15:

16:

17: \begin{document}

18:

19: \title{A Galaxy Photometric Redshift Catalog for the Sloan Digital Sky Survey Data Release 6}

20:

21: \author{

22: Hiroaki Oyaizu$^{1,2}$,

23: Marcos Lima$^{2,3}$,

24: Carlos E. Cunha$^{1,2}$,

25: Huan Lin$^{4}$,

26: Joshua Frieman$^{1,2,4}$,

27: Erin S. Sheldon$^{5}$

28: }

29:

30: \affil{

31: ${}^{1}$Department of Astronomy and Astrophysics, University of Chicago, Chicago, IL 60637 \\

32: ${}^{2}$Kavli Institute for Cosmological Physics, University of Chicago, Chicago, IL 60637 \\

33: ${}^{3}$Department of Physics, University of Chicago, Chicago, IL 60637 \\

34: ${}^{4}$Center for Particle Astrophysics, Fermi National Accelerator Laboratory, Batavia, IL 60510 \\

35: ${}^{5}$Center for Cosmology and Particle Physics and Department of Physics, New York University, New York, NY 10003 \\

36: }

37:

38:

39: %\date{\today}

40:

41: %-----------------------------------------------------------------------------

42:

43: \begin{abstract}

44:

45: We present and describe a catalog of galaxy photometric redshifts (photo-z's)

46: for the  Sloan Digital Sky Survey (SDSS) Data Release 6 (DR6).

47: We use the Artificial Neural Network (ANN) technique to calculate photo-z's

48: and the Nearest Neighbor Error (NNE) method to estimate photo-z errors for

49: $\sim$ 77 million objects classified as galaxies in DR6 with $r < 22$.

50: The photo-z and photo-z error estimators are trained and validated on a

51: sample of $\sim 640,000$ galaxies that have SDSS photometry and

52: spectroscopic redshifts measured by SDSS, 2SLAQ, CFRS, CNOC2, TKRS,

53: DEEP, and DEEP2.

54: For the two best ANN methods we have tried,

55: we find that 68\% of the galaxies in the validation set have a photo-z

56:  error smaller than

57: $\sigma_{68} =0.021$ or $0.024$.

58: After presenting our results and quality tests, we provide a short guide

59: for users accessing the public data.

60:

61: \end{abstract}

62:

63: \keywords{photometric redshifts sdss -- Sloan Digital Sky Survey}

64:

65: %\maketitle

66: %------------------------------------------------------------------------------

67:

68:

69: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

70: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

71: \section{Introduction}\label{int}

72: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

73: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

74:

75: While spectroscopic redshifts have now been measured for over one million

76: galaxies, in recent years

77: digital sky surveys have obtained multi-band imaging

78: for of order a hundred million galaxies. Deep, wide-area surveys planned for

79: the next decade will increase the number of galaxies with

80: multi-band photometry to a few billion. Due to technological and financial

81: constraints, obtaining spectroscopic redshifts for more than a

82: small fraction of these galaxies will remain impractical for the foreseeable

83: future. As a result, over the last decade substantial effort has gone into

84: developing photometric redshift (photo-z) techniques, which use

85: multi-band photometry to estimate approximate galaxy redshifts. For many

86: applications in extragalactic astronomy and cosmology, the resulting

87: photometric redshift precision is sufficient for the science goals at

88: hand, provided one can accurately characterize the uncertainties in the

89: photo-z estimates.

90:

91: Two broad categories of photo-z estimators are in wide use:

92: template-fitting and training set methods. In template-fitting, one

93: assigns a redshift to a galaxy by finding

94: the redshifted spectral energy distribution (SED), selected

95: from a libary of templates,

96: that best reproduces the observed fluxes in the broadband filters.

97: By contrast, in the training set approach, one

98: uses a training set of galaxies with

99: spectroscopic redshifts and photometry to derive an empirical relation

100: between photometric observables (e.g., magnitudes, colors, and morphological

101: indicators) and redshift.

102: Examples of empirical methods include Polynomial Fitting \citep{con95b},

103: the Nearest Neighbor method \citep{csa03},

104: the Nearest Neighbor Polynomial (NNP) technique \citep{cun07},

105: Artificial Neural Networks (ANN) \citep{col04,van04,dab07}, and

106: Support Vector Machines \citep{wad04}. When a large spectroscopic

107: training set that is representative of the photometric data set to be

108: analyzed is

109: available, training set techniques typically outperform template-fitting

110: methods, in the sense that the photo-z estimates have smaller scatter

111: and bias with respect to the true redshifts \citep{cun07}. On the

112: other hand, template-fitting can be applied to a photometric sample

113: for which relatively few spectroscopic analogs exist.

114: For a comprehensive review and comparison of photo-z methods,

115: see \cite{cun07}.

116:

117: In this paper, we present a publicly available galaxy photometric redshift

118: catalog for the Sixth Data Release (DR6) of the Sloan Digital Sky

119: Survey (SDSS) imaging catalog \citep{bla03b,eis01,gun98,ive04,str02,yor00}.

120: We use the ANN photo-z method, which we have shown to

121: be a superior training set method \citep{cun07}, and briefly compare the

122: results using different photometric observables.

123: We also compare the ANN results with those from NNP, an empirical

124: method which achieves similar performance to the ANN method \citep{cun07}. %briefly with other methods.

125: Since the SDSS photometric catalog covers a large area of sky, a number

126: of deep spectroscopic galaxy samples with SDSS photometry are available

127: to use as training sets, as shown in Fig.~\ref{dist.sdss}.

128: In combination, these spectroscopic samples cover the full apparent

129: magnitude range of the SDSS photometric sample.

130:

131: The paper is organized as follows.

132: In \S \ref{sel} we briefly describe the SDSS DR6 photometric catalog

133: and the selection criteria used

134: to obtain the galaxy photometric sample from the catalog.

135: In \S \ref{tra} we describe the spectroscopic catalogs used

136: to construct the photo-z training and validation sets.

137: In \S \ref{met} we outline the photo-z methods as well as the

138: photo-z error estimator technique applied to the galaxy sample.

139: Statistical results for photometric redshift performance, errors,

140: and redshift distributions

141: are presented in \S \ref{res}. In \S \ref{rec}

142: we make recommendations for possible

143: additional cuts on the photo-z catalog based on our

144: own flags and those in the SDSS database.

145: In \S \ref{cat} we briefly describe how to access the

146: photo-z catalog from the public SDSS data server, and in \S \ref{con} we

147: present our conclusions. For completeness, Appendix \ref{query}

148: provides the database query used to select the photometric sample,

149: Appendix \ref{stargal} discusses issues of star-galaxy separation,

150: and Appendix \ref{photdr5} briefly describes an earlier version

151: of the photo-z algorithm used for SDSS DR5 \citep{ade07}.

152:

153: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

154: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

155: \section{SDSS Photometric Catalog and Galaxy Selection}

156: \label{sel}

157: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

158: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

159:

160: The SDSS comprises a large-area

161: imaging survey of the north Galactic cap, a multi-epoch imaging survey of

162: an equatorial stripe in the south Galactic cap, and a spectroscopic survey of

163: roughly $10^6$ galaxies and $10^5$ quasars

164: \citep{yor00}.

165: The survey uses a dedicated, wide-field, 2.5m telescope \citep{gun06} at

166: Apache Point Observatory, New Mexico.

167: Imaging is carried out in drift-scan mode using a 142 mega-pixel camera

168: \citep{gun06} that gathers data in five broad bands, $u g r i z$, spanning

169: the range from 3,000 to 10,000 \AA \, \citep{fuk96}, with an effective exposure

170: time of 54.1 seconds per band.

171: The images are processed using specialized

172: software \citep{lup01,sto02} and are

173: astrometrically \citep{pie03} and photometrically \citep{hog01,tuc06}

174: calibrated using observations of a set of primary standard stars

175: \citep{smi02} observed on a neighboring 20-inch telescope.

176:

177: The imaging in the sixth SDSS Data Release (DR6) covers an essentially

178: contiguous region of the north Galactic cap, with only a few small patches

179: remaining to be observed. In any region where imaging runs overlap, one run is

180: declared primary\footnote{For the precise definition of primary objects see

181: {\tt http://cas.sdss.org/dr6/en/help/docs/glossary.asp\#P}}

182: and is used for spectroscopic target selection;

183: other runs are declared secondary.

184: The area covered by the DR6 primary imaging survey, including the

185: southern stripes, is $8417 \textrm{ deg}^2$, but

186: DR6 includes both the primary and secondary observations of

187: each area and source \citep{dr6}.

188:

189: \begin{figure}

190:   \begin{minipage}[t]{85mm}

191:     \begin{center}

192:       \resizebox{85mm}{!}{\includegraphics[angle=0]{f1.c.eps}}

193:     \end{center}

194:   \end{minipage}

195: \caption{Normalized $r$ magnitude distributions for various catalogs.

196:   {\it Top three rows:}

197:   the distributions of the spectroscopic catalogs used for photo-z

198:   training and validation are

199:   shown for 2SLAQ, CFRS, CNOC2, TKRS,

200:   DEEP and DEEP2, and the SDSS spectroscopic sample.

201:   $N_{tot}$ denotes the total number of galaxy measurements used

202:   from each catalog; for galaxies in regions with repeat SDSS imaging,

203:   each independent photometric measurement is counted separately.

204:   {\it Bottom row:} ({\it left})---the distribution of the combined

205:   spectroscopic sample; ({\it right})---the

206:   distribution for the SDSS photometric galaxy sample, where

207:   objects were classified as galaxies according to the

208:   photometric TYPE flag (see text).

209: }\label{dist.sdss}

210: \end{figure}

211:

212: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

213: %\section{Photometric selection of the galaxies} \label{sel}

214: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

215:

216:

217: The SDSS database provides a variety of measured magnitudes for each

218: detected object. Throughout this paper, we use dereddened model magnitudes to

219: perform the photometric redshift computations. To determine the model

220: magnitude, the SDSS photometric pipeline fits two

221: models to the image of each galaxy in each passband: a de Vaucouleurs (early-type) and

222: an exponential (late-type) light profile.

223: The models are convolved with the estimated point

224: spread function (PSF), with arbitrary axis ratio and position angle.

225: The best-fit model in the $r$ band (which is used to fix the model scale

226: radius) is then applied to the other passbands and convolved with the

227: passband-dependent PSFs to yield the model magnitudes.

228: Model magnitudes provide an unbiased color estimate in the absence of color

229: gradients \citep{sto02}, and the dereddening procedure removes the

230: effect of Galactic extinction \citep{sch98}.

231:

232: %%%%%%%%%%%%%%%%%%

233:

234: \begin{deluxetable}{c c | c c}

235: \tablewidth{0pt}

236: \tablecaption{Photometric Sample Properties}

237: \startdata

238: \hline

239: \hline

240: \multicolumn{2}{c}{\hspace{0.1 in} AB magnitude limits \hspace{0.2 in}  }

241: &\multicolumn{2}{c}{\hspace{0.2 in} RMS photometric \hspace{0.4 in}} \\

242: \multicolumn{2}{c}{}

243: & \multicolumn{2}{c}{\hspace{0.2 in} calibration errors } \\

244: \hline

245:   \hspace{0.1 in}  $u$ & 22.0 & \hspace{0.4 in} $r$   & 2\% \\

246:   \hspace{0.1 in}  $g$ & 22.2 & \hspace{0.4 in} $u-g$ & 3\% \\

247:   \hspace{0.1 in}  $r$ & 22.2 & \hspace{0.4 in} $g-r$ & 2\% \\

248:   \hspace{0.1 in}  $i$ & 21.3 & \hspace{0.4 in} $r-i$ & 2\% \\

249:   \hspace{0.1 in}  $z$ & 20.5 & \hspace{0.4 in} $i-z$ & 3\% \\

250: \enddata

251: \tablecomments{Magnitude limits are for 95\% completeness for point

252:   sources in typical seeing; 50\% completeness numbers are generally

253:   0.4 mag fainter \citep{ade07}. The median seeing for the SDSS imaging

254:   survey is $1.4''$.

255: } \label{propphot}

256: \end{deluxetable}

257:

258: To construct the photometric sample of galaxies for which we wish to

259: estimate photo-z's, we obtained

260: a catalog drawn from the SDSS CasJobs website

261: {\tt http://casjobs.sdss.org/casjobs/}.

262: We checked some of the SDSS photometric flags to ensure that we have obtained

263: a reasonably clean galaxy sample. In particular,

264: we selected all primary objects from DR6 that have the TYPE flag

265: equal to $3$ (the type for galaxy) and that do not

266: have any of the flags BRIGHT, SATURATED, or SATUR\_CENTER set.

267: %NOPETRO\_BIG set.

268: For the definitions of these flags we refer the reader to the

269: PHOTO flags entry at the SDSS

270: website\footnote{{\tt http://cas.sdss.org/dr6/en/help/browser/browser.asp}}

271: or to Appendix \ref{query}.

272: We also took into account the nominal SDSS flux limit

273: (see Table~\ref{propphot}) by only selecting galaxies with dereddened model

274: magnitude $r<22.0$.

275: The full database query we used is given in Appendix \ref{query}.

276:

277: The photometric galaxy catalog we have selected suffers from impurity and

278: incompleteness at some level, since

279: the photometric pipeline cannot

280: separate stars from galaxies with 100\% success

281: at faint magnitudes. We

282: describe some of our tests of star/galaxy separation in

283: Appendix \ref{stargal}, where we show that the SDSS TYPE flag

284: provides star/galaxy separation performance similar to other

285: methods.

286:

287: \begin{figure}

288:   \begin{minipage}[t]{85mm}

289:     \begin{center}

290:       \resizebox{85mm}{!}{\includegraphics[angle=0]{f2.c.eps}}

291:     \end{center}

292:   \end{minipage}

293:   \caption{Distribution of $g-r$ and $r-i$ colors for different SDSS samples. {\it Top row:} the color distributions for galaxies in the SDSS spectroscopic

294:     sample.

295:     {\it Middle row:} the color distributions for galaxies in the other (non-SDSS)

296:     spectroscopic training samples.

297:     {\it Bottom row:} the color distributions for galaxies in the photometric

298:     sample.

299:     As above, galaxy/star classification used the photometric TYPE flag.

300: }\label{dist.color.sdss}

301: \end{figure}

302:

303: The final photometric sample comprises $77,418,767$ galaxies.

304: The $r$ magnitude distribution of this sample is shown in

305: the bottom right panel of Fig.~\ref{dist.sdss}; the $g-r$ and

306: $r-i$ color distributions

307: are shown in the bottom panels of Fig.~\ref{dist.color.sdss}.

308:

309: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

310: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

311: \section{Spectroscopic Training and Validation sets} \label{tra}

312: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

313: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

314:

315:

316: Since our methods to estimate photo-z's and photo-z errors are

317: training-set based, we would ideally like the spectroscopic

318: training set to be

319: fully representative of the photometric sample to be analyzed, i.e., to have

320: similar statistical properties and magnitude/redshift distributions.

321: Training-set methods can be thought of as inherently Bayesian, in the sense

322: that the training-set distributions form effective priors for the analysis of the

323: photometric sample; to the extent that the training-set distributions

324: reflect those of the photometric sample, we may expect the photo-z estimates

325: to be unbiased (or at least they will not be biased by the prior).

326: Given the practical difficulties of carrying out spectroscopy at

327: faint magnitudes and low surface brightness, such an ideal generally cannot be achieved.

328: Realistically, all we can hope for is a training set that

329: (a) is large enough that statistical fluctuations are small and (b)

330: spans the same magnitude, color, and redshift ranges as the photometric sample.

331: Fortunately, our tests indicate that the estimated photo-z's

332: depend only weakly on the shape of the

333: redshift and magnitude distributions of the training set for the SDSS.

334:

335:

336: \begin{figure*}

337:   \begin{center}

338:     \resizebox{150mm}{!}{\includegraphics[angle=0]{f3.eps}}

339:     \caption{A simple FFMP network with 3 layers and configuration $2:1:1$.

340: The inputs are the

341: two magnitudes, $m_1$ and $m_2$.

342: Ix denotes the input from node x, and Ox is the corresponding output of this node.

343: The weights $w$ associated with each connection are found by training the network

344: using training and validation sets (see text).}

345:     \label{NNsimple}

346:   \end{center}

347: \end{figure*}

348:

349:

350: We have constructed a spectroscopic sample consisting of $639,911$

351: galaxies that have SDSS photometry measurements

352: (counting repeats; see below) and that have

353: spectroscopic redshifts measured by the SDSS or by

354: other surveys, as described below.

355: We imposed a magnitude limit of $r<23.0$ on the spectroscopic

356: sample and applied

357: additional cuts on the quality of the spectroscopic

358: redshifts reported by the different surveys.

359: Since we impose a limit of $r<22.0$ for the SDSS photometric sample,

360: the fainter limit chosen

361: for the spectroscopic training sample accommodates the full photometric

362: range of interest without creating boundary effects for photo-z's of

363: galaxies with magnitudes near the photometric sample limit of $r = 22$.

364: Each survey providing spectroscopic redshifts defines a redshift

365: quality indicator; we refer the reader to the respective publications listed

366: below for their precise definitions.

367: For each survey, we chose a redshift quality cut roughly corresponding

368: to 90\% redshift confidence or greater.

369: The SDSS spectroscopic sample

370: provides $531,672$ redshifts, principally from the MAIN and

371: Luminous Red Galaxy (LRG) samples, with confidence level

372: $z_{\rm conf} > 0.9$. The remaining redshifts are:

373: $21,123$ from the Canadian Network for Observational Cosmology

374: Field Galaxy Survey \citep[CNOC2;][]{yee00},

375: $1,830$ from the Canada-France Redshift Survey \citep[CFRS;][]{lil95}

376: with Class $> 1$,

377: $31,716$ from the Deep Extragalactic Evolutionary Probe \citep[DEEP;][]{deep2}

378: with $q_z$ =  A or B and from DEEP2

379: \citep{wei05}\footnote{{\tt http://deep.berkeley.edu/DR2/ }}

380: with $z_{\rm quality} \geq 3$,

381: $728$ from the Team Keck Redshift Survey \citep[TKRS;][]{wir04}

382: with $z_{\rm quality} > -1$, and

383: $52,842$ LRGs from the

384: 2dF-SDSS LRG and QSO Survey

385: \citep[2SLAQ;][]{can06}\footnote{{\tt http://lrg.physics.uq.edu.au/New\_dataset2/ }}

386: with $z_{\rm op} \geq 3$.

387:

388: We positionally matched the galaxies with spectroscopic redshifts against photometric

389: data in the SDSS {\tt BestRuns} CAS database, which allowed us

390: to match with photometric measurements in different SDSS imaging runs.

391: The above numbers for galaxies with redshifts count independent photometric

392: measurements of the same objects due to multiple SDSS imaging of the same

393: region; in particular SDSS Stripe 82 has been imaged a number of times.

394: The numbers of {\em unique} galaxies used from these surveys are

395: $1,435$ from CNOC2,

396: $272$ from CFRS,

397: $6,049$ from DEEP and DEEP2,

398: $389$ from TKRS, and

399: $11,426$ from 2SLAQ.

400: The SDSS spectroscopic samples were drawn from the SDSS primary galaxy sample and therefore are all unique.

401: The spectroscopic sample obtained by combining all these catalogs,

402: including the repeats, was divided into two catalogs of the

403: same size ($\sim 320,000$ objects each).

404: One of these catalogs was taken to be

405: the {\it training set} used by the photo-z and error estimators, and the other

406: was used as a {\it validation set} to carry out tests of photo-z

407: quality (see \S \ref{subsec:meth_photoz}). Our tests indicate that this

408: procedure of treating different

409: images of the same training/validation set galaxies as independent objects leads

410: to good results, provided all the photometric measurements for a given object

411: are confined to either the training set or the validation set and not mixed. By

412: contrast,

413: excluding such multiple images from the spectroscopic sample would result

414: in much smaller training and validation sets; these would be very sparse at

415: faint magnitudes, leading to much diminished photo-z quality there. On the other

416: hand, splitting

417: the repeat images of a given object between the training and validation sets

418: may result in ``over-fitting'' of the derived photo-z's

419: (see \S \ref{subsec:meth_photoz}).

420:

421: The $r$-magnitude and color ($g-r$ and $r-i$)

422: distributions for the spectroscopic samples and for

423: the photometric sample are shown in Figs. \ref{dist.sdss} and

424: \ref{dist.color.sdss}. While the magnitude and color distributions of

425: the combined spectroscopic sample are not

426: identical to those of the photometric sample, the

427: spectroscopic sample does span the

428: range of apparent magnitude and color of the photometric sample.

429: To test the impact of having a training set that is not fully representative

430: of the photometric sample, we

431: divided the spectroscopic sample into smaller, alternate training and

432: validation sets. For instance,

433: to test the effect of the training-set magnitude distribution on the

434: photo-z estimates, we created a training set with a flat $r$

435: magnitude distribution and another with an $r$ magnitude distribution similar to that

436: of the

437: photometric sample. Our tests indicated that the photo-z quality

438: is not strongly affected by the magnitude

439: distribution of the training set.

440: The changes in the photo-z performance metrics

441: (the rms scatter and the 68\% CL region, defined below in

442: \S \ref{res}) were smaller than $10\%$ when the training-set magnitude

443: distribution was varied between these different choices.

444: Since using the entire spectroscopic

445: sample for the training and validation sets produced marginally better results

446: than all other cases tested, we have adopted this as our final choice. In addition,

447: we tested the effect of the size of the training set on

448: our photo-z calculations. We found that the photo-z performance metrics

449: defined in \S \ref{res-photoz}

450: are degraded by no more than 10\% when the training set is artificially

451: reduced to 10\% of its original size. Even when the training set is

452: reduced to $\sim 1\%$ of its original size, the photo-z performance metrics are

453: degraded by less than $25\%$. This gives us confidence that

454: the spectroscopic training set size used here is sufficient for extracting

455: nearly optimal photo-z estimates.

456:

457: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

458: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

459: \section{Methods}\label{met}

460: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

461: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

462:

463: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

464: \subsection{ANN and NNP Photometric redshifts}

465: \label{subsec:meth_photoz}

466: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

467:

468: The ANN method that we use to estimate galaxy photo-z's is

469: a general classification and interpolation tool used

470: successfully in an array of fields such as hand writing recognition,

471: automatic aircraft

472: piloting\footnote{{\tt http://www.nasa.gov/centers/dryden/news/NewsReleases/2003/03-49.html}},

473: detecting credit card

474: fraud\footnote{{\tt http://www.visa.ca/en/about/visabenefits/innovation.cfm}},

475: and extracting astronomically interesting sources in a telescope image

476: \citep{bertin96}.

477:

478: We use a particular type of ANN called a Feed Forward Multilayer

479: Perceptron (FFMP) to map the relationship between photometric observables

480: and redshifts.

481: An FFMP network consists of several input nodes, one or more hidden layers,

482: and several output nodes, all interconnected by weighted connections

483: (see Fig.~\ref{NNsimple}).

484: We follow the notation of \cite{col04} and denote a network with

485: $N_i$ input nodes, $N_{h_j}$ nodes in hidden layer $j$, and $N_o$

486: output nodes as $N_i:N_{h_1}:N_{h_2}:...:N_{h_m}:N_o$.

487: For each input object, the input photometric

488: data (e.g., magnitudes, colors, concentrations, etc.)

489: are fed into the input

490: nodes of the FFMP, which fire signals according to the values of the

491: input data.

492: Each node in a hidden layer receives a total input which is a weighted

493: sum of the outputs from the nodes in the previous layer,

494:  i.e., node $i$ in a hidden layer receives an input $I_i$ given by

495:

496: \begin{equation}

497: I_i = \sum_j w_{ij} O_j,

498: \end{equation}

499:

500: \noindent where $O_j$ is the output of the $j^{\rm th}$ node of the previous

501: layer and $w_{ij}$ is the weight of the connection between node $i$ in

502: the hidden layer and node $j$ in the previous layer.

503: Given the input $I_i$, the output $O_i$ of node $i$ is a function $f$ of the

504: input,

505:

506: \begin{equation}

507: O_i=f(I_i), \label{act}

508: \end{equation}

509:

510: \noindent where $f$ is the activation function.

511: Repeating this process, signals propagate up to the output nodes.

512: The activation function is typically a sigmoid function:

513:

514: \begin{equation}

515: f(I_i) = \frac{1}{1 + e^{-I_i}}. \label{sigm}

516: \end{equation}

517:

518: \noindent However, there are various alternatives, such as step

519: functions and hyperbolic tangents.

520: \cite{van04} show that the choice of activation functions makes

521: no significant difference in the result.

522:

523: We use $X$:20:20:20:1 networks to estimate photo-z's, where $X$ is the

524: number of input photometric parameters per galaxy. % in this work.

525: The corresponding number of degrees of freedom (the number of weights) is

526:  roughly 1,000, depending on the actual value of $X$.

527: We use hyperbolic tangent functions as the activation function of the

528: hidden layers and a linear activation function for the output layer.

529:

530: Despite the occasional aura of mystery surrounding neural networks,

531: an FFMP is nothing more than a complex

532: mathematical function; in fact, one can always write down the analytic

533: expression corresponding to a neural network function.

534:

535: Once the network configuration is specified, it can be trained to

536: output an estimate of redshift given the input photometric observables.

537: The training process involves

538: finding the set of weights $w_{ij}$ that

539: minimize a score function $E$, chosen here to be

540:

541: \begin{equation}

542: E = \frac{1}{2}\sum_i(z_{\rm spec}^{i} - z_o^{i})^2 ~,

543: \label{eq:score}

544: \end{equation}

545:

546: \noindent where $z_{\rm spec}$ is the measured spectroscopic redshift, $z_o$ is the

547: output redshift of the output node, and the sum is over all galaxies

548: in the training set. Note that the choice of score function is not unique,

549: and different choices will in general lead to different photo-z estimates.

550: The minimization of this score function can be done efficiently

551: because its derivatives with respect to the weights

552: are available analytically.

553: We use a Variable Metric method as described in \cite{pre92} for the minimization.

554:

555: In machine learning, over-fitting refers

556: to the tendency of an algorithm with many adjustable parameters

557: to fit to the noise in the training set data.

558: In order to avoid over-fitting, we use the technique of

559: early stopping.

560: The spectroscopic sample is divided into two

561: independent subsets, the

562: {\it training} and {\it validation} sets,

563: and the formal minimizations are done using the training set.

564: After each minimization step, the network is evaluated on the

565: validation set, and

566: the set of weights that performs best on the validation set

567: is chosen as the final set. Another issue in machine learning is that

568: minimization procedures that start at different initial choices of weights

569: generally end at different local minima of the score

570: function.

571: To reduce the chance of ending in a less-than-optimal local minimum,

572: we minimize five networks starting at different positions in the space of weights.

573: Among these, we choose the network that gives the lowest photo-z scatter

574: (cf. Eq. \ref{eq:score})

575: in the validation set.

576: For more details of our implementation of the ANN and its performance on

577: mock catalogs and real data, see \cite{cun07}.

578:

579: The ANN photo-z algorithm is very flexible in the sense that it is easy

580: to change the input parameters, the training set, and the network configurations.

581: We tried a variety of combinations of possible input photometric

582: observables to see their effects on photo-z quality.

583: We calculated photo-z's using galaxy magnitudes, colors, and the

584: concentration indices for some or all of the passbands.

585: The concentration index $c_i$ in passband $i$ is defined as the ratio of {\tt PetroR50}

586: and {\tt PetroR90}, which are the radii that encircle 50\% and 90\% of the

587: Petrosian flux, respectively. Early-type (E and S0) galaxies, with centrally

588: peaked surface brightness profiles, tend to have low values of the

589: concentration index, while late-type spirals, with quasi-exponential light

590: profiles, typically have higher values of $c$.

591: Previous studies \citep{morg58,shi01,yam05,par05} have shown

592: that the concentration parameter correlates well

593: with galaxy morphological type, and we used it to help break the

594: degeneracy between redshift and galaxy type.

595: We present the photo-z results for different combinations of input

596: parameters in \S\ref{res}.

597:

598: For comparison, we also computed photo-z's for the

599: validation set using another empirical method, the Nearest Neighbor

600: Polynomial (NNP) technique \citep{cun07}.

601: In NNP, to derive a photo-z for a galaxy in the photometric sample,

602: we look for its training-set nearest neighbors in the space of

603: photometric observables (magnitudes, colors, etc.).

604: Suppose we have $N_D$ photometric data entries for each galaxy.

605: The data vector for the galaxy of interest in the photometric sample is

606: denoted by $\ D^{\mu}=(D^1,D^2,...,D^{N_D})$,

607: while the data vector for the $i^{\rm th}$ galaxy in the training set is

608: $\ D^{\mu}_i=(D^1_i,D^2_i,...,D^{N_D}_i)$.

609: The distance $d_i$ between the photometric object and the $i^{\rm th}$

610: training set galaxy is defined using a flat metric in data space,

611:

612: \begin{equation}

613: d_i^2 = \sum_{\mu=1}^{N_D} (D^{\mu} - D_{i}^{\mu})^2~. \label{nndef}

614: \end{equation}

615:

616:

617: \noindent The nearest neighbors are the training-set objects

618: for which $d_i$ is minimum. Once the nearest neighbors for a given

619: galaxy are identified,

620: they are used to fit the coefficients of a local, low-order polynomial relation

621: between photometric observables and redshift.

622: The galaxy photo-z is then obtained by applying

623: the derived relation to the photometric object.

624:

625: For the NNP method employed in this work, we take the

626: photometric data $D^{\mu}$ in Eq.~(\ref{nndef})

627: to be the four ``adjacent'' galaxy colors $u-g, \ g-r, \ r-i, \ i-z$; we found that

628: this choice produces results marginally better than using the galaxy

629: magnitudes.

630: We use the nearest $1000$ neighbors to fit a quadratic polynomial

631: relation between redshift and the photometric data, here chosen

632: to be the five magnitudes in each passband ($ugriz$) and their

633: corresponding concentration indices.

634: We note that \cite{wan07} used a similar technique to estimate

635: photo-z's for a small sample of SDSS {\it spectroscopic} galaxies.

636: They applied the Kernel Regression method of order 0, weighting

637: the training-set neighbors and computing photo-z's by using the

638: weighted average of the neighbors' redshifts.

639: Our NNP method is closer to a Kernel Regression of order 2, since

640: we perform quadratic fits; however, we do not apply variable weights to the neighbors

641: but treat them equally in the fit.

642:

643: Whereas the ANN method provides

644: a single, nonlinear, global fit using the whole

645: training set and applies the derived photo-z relation to all photometric objects,

646: the NNP method yields a separate, linear (in parameters), local fit for

647: each photometric object using its neighbors. If

648: the galaxy magnitude-concentration-redshift hypersurface is a differentiable manifold,

649: i.e., if it can be locally approximated by a hyperplane even though it

650: is globally curved, then these two photo-z methods should be roughly

651: equivalent. Indeed, as we show in \S \ref{res}, their performance is very similar.

652:

653: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

654: \subsection{Photometric redshift errors}\label{meter}

655: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

656:

657: We estimated photo-z errors for objects in the photometric catalog using

658: the Nearest Neighbor Error (NNE) estimator \citep{oya07}.

659: The NNE method is training-set based, with

660: a neighbor selection similar to the NNP photo-z estimator; it

661: associates photo-z errors to photometric objects by considering the

662: errors for objects with similar multi-band magnitudes in the

663: validation set.

664: We use the validation set, because the photo-z's of the training set could be

665: over-fit, which would result in NNE underestimating the photo-z errors.

666:

667: The procedure to calculate the redshift error for a galaxy in the photometric

668: sample is as follows.

669: We find the validation-set nearest neighbors to the galaxy of

670: interest. In contrast to NNP,

671: where the distance in Eq.~(\ref{nndef}) was defined in color space,

672: the NNE distance is defined in magnitude space, since photo-z errors

673: correlate strongly with magnitude.

674: Since the selected nearest neighbors are in the spectroscopic sample,

675: we know their photo-z errors, $\delta z = z_{\rm phot}-z_{\rm spec}$, where

676: $z_{\rm phot}$ is computed using the ANN or the NNP method.

677: We calculated the $68\%$ width of the $\delta z$ distribution

678: for the neighbors and assigned that number as the photo-z error

679: estimate for the photometric galaxy. Here we selected

680: the nearest $200$ neighbors of each object to estimate its photo-z error.

681: In studies of photo-z error estimators applied

682: to mock and real galaxy catalogs, we found that NNE

683: accurately predicts the photo-z error when the training set is

684: representative of the photometric sample \citep{oya07}.

685:

686: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

687: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

688: \subsection{Estimating the Redshift Distribution}\label{estdist}

689: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

690: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

691:

692: As we shall see in \S \ref{res-photoz}, estimates for

693: galaxy photo-z's suffer from statistical biases that in general

694: cannot be completely removed on an object-by-object basis. However, we

695: can seek an unbiased estimate of the true redshift {\it distribution}

696: for the photometric sample that is independent of individual

697: galaxy photo-z estimates. For some statistical applications,

698: the redshift distribution of the photometric sample, as opposed

699: to individual galaxy photo-z's, is all that is required.

700: One way to estimate this distribution is to

701: assign a weight to every galaxy in the spectroscopic sample

702: such that the {\it weighted} spectroscopic sample has the same

703: distributions of magnitudes and colors as the photometric sample.

704: The $z_{\rm spec}$ distribution of this weighted spectroscopic

705: sample provides an estimate of the true, underlying

706: redshift distribution of the photometric sample.

707:

708: The weight $W^{\alpha}$ of the $\alpha^{\rm th}$ spectroscopic

709: galaxy is calculated by comparing

710: the local density around the galaxy in the spectroscopic sample with

711: the density of the corresponding region in the photometric sample.

712: The local density is evaluated by counting the number of

713: nearest neighbors using the distance measured in the space of photometric

714: observables, as in Eq.~(\ref{nndef}). We fix the number of spectroscopic

715: neighbors, $N_{\rm S}$, which determines the distance $d_{\rm max}$

716: to the $N_{\rm S}^{\rm th}$-nearest spectroscopic neighbor.

717: We then find the number of neighbors $N_{\rm P}$ in the photometric

718: sample within the same distance $d_{\rm max}$ of the spectroscopic

719: galaxy. Up to an arbitrary normalization factor, the weight is defined as

720:

721: \begin{eqnarray}

722: W^{\alpha} \sim \frac{N_{\rm P} }{ N_{\rm S} } ~.

723: \label{eqn:weight}

724: \end{eqnarray}

725:

726: \noindent For our estimates, we chose $N_{\rm S}=20$, which provides a good

727: match of the weighted spectroscopic distributions of magnitudes

728: and colors to those of the photometric sample. We note that if

729: additional cuts in magnitude or color are applied to the photometric

730: sample, then this procedure must be repeated for the newly selected photometric

731: sample.

732: More details and tests of this method and comparisons with

733: other methods for estimating the

734: underlying redshift distribution (e.g., deconvolving the error distribution

735: from the \zphot \ distribution) will be presented

736: separately \citep{lim07}.

737:

738:

739: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

740: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

741:

742: \begin{figure*}

743:   \begin{center}

744:     \begin{minipage}[t]{46mm}

745:       \begin{center}

746:       \resizebox{46mm}{!}{\includegraphics[angle=0]{f4a.eps}}

747:       \end{center}

748:     \end{minipage}

749:     \begin{minipage}[t]{46mm}

750:       \begin{center}

751:       \resizebox{46mm}{!}{\includegraphics[angle=0]{f4b.eps}}

752:       \end{center}

753:     \end{minipage}

754:     \begin{minipage}[t]{46mm}

755:       \begin{center}

756:       \resizebox{46mm}{!}{\includegraphics[angle=0]{f4c.eps}}

757:       \end{center}

758:     \end{minipage}

759:     \begin{minipage}[t]{46mm}

760:       \begin{center}

761:       \resizebox{46mm}{!}{\includegraphics[angle=0]{f4d.eps}}

762:       \end{center}

763:     \end{minipage}

764:     \begin{minipage}[t]{46mm}

765:       \begin{center}

766:       \resizebox{46mm}{!}{\includegraphics[angle=0]{f4e.eps}}

767:       \end{center}

768:     \end{minipage}

769:     \begin{minipage}[t]{46mm}

770:       \begin{center}

771:       \resizebox{46mm}{!}{\includegraphics[angle=0]{f4f.eps}}

772:       \end{center}

773:     \end{minipage}

774:     \begin{minipage}[t]{46mm}

775:       \begin{center}

776:       \resizebox{46mm}{!}{\includegraphics[angle=0]{f4g.eps}}

777:       \end{center}

778:     \end{minipage}

779:     \begin{minipage}[t]{46mm}

780:       \begin{center}

781:       \resizebox{46mm}{!}{\includegraphics[angle=0]{f4h.eps}}

782:       \end{center}

783:     \end{minipage}

784:     \begin{minipage}[t]{46mm}

785:       \begin{center}

786:       \resizebox{46mm}{!}{\includegraphics[angle=0]{f4i.eps}}

787:       \end{center}

788:     \end{minipage}

789:   \end{center}

790:  \caption{ $z_{\rm phot}$ versus $z_{\rm spec}$ for the validation set for

791:    different ranges of $r$ magnitude and for different photo-z techniques.

792:    {\it Left column:} objects with $r<20$; {\it middle column:} objects with $r>20$;

793:    {\it right column:} all objects.

794:    {\it Top row:} ANN case D1, where the input photometric data comprise

795:    the 5 magnitudes ($ugriz$) and the 5 concentration parameters, and the training

796:    is split into 5 bins of $r$ magnitude

797:    {\it Middle row:} ANN case CC2, where the input data are

798:    the 4 colors $u-g$, $g-r$, $r-i$, $i-z$, and 3 concentration parameters $c_gc_rc_i$.

799:    {\it Bottom row:} results for the NNP method, where the input data are

800:    the 5 magnitudes and 5 concentration parameters.

801:    In all cases, the photo-z methods

802:    used a training set with $\sim 320,000$ objects, and the derived solutions were

803:    applied to an independent validation set with $\sim 309,000$ objects and

804:    $r < 22$, reflecting the magnitude limit of the photometric sample.

805:    The solid line in each panel indicates $z_{\rm phot}=z_{\rm spec}$; the

806:    dashed and dotted lines show the 68\% and 95\% confidence regions as a function

807:    of $z_{\rm spec}$.

808:    The points display results for a random $10\%$ subset of the validation set in

809:    each magnitude range.

810:  }

811: \label{zpzs_valid_all}

812: \end{figure*}

813:

814: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

815: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

816: \section{Results} \label{res}

817: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

818: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

819:

820: \subsection{Photometric redshifts}

821: \label{res-photoz}

822:

823: The photo-z precision (variance) and accuracy (bias) are

824: limited by a number of factors. There are

825: intrinsic degeneracies in

826: magnitude-redshift space: low-luminosity, intrinsically red galaxies at low

827: redshift can have apparent magnitudes similar to those of high-luminosity,

828: intrinsically blue galaxies at high redshift.

829: This natural degeneracy is amplified by

830: photometric errors, since magnitude uncertainties

831: propagate to photo-z errors.

832: In addition to these observational limitations, which are

833: determined by the photometric precision and the number of passbands of a survey,

834: the photo-z estimator itself may have inherent limitations. For example,

835: for training set methods, the size and representativeness of the training

836: set are important factors, as are the number of parameters or weights in

837: the fitting functions.

838:

839: To test the quality of the photo-z estimates,

840: we use four photo-z performance metrics.

841: The first two metrics are the photo-z bias, $z_{\rm bias}$, and the photo-z {\it rms}

842: scatter, $\sigma$, both averaged over all $N$ objects in the validation

843: set, defined by

844:

845: \begin{eqnarray}

846: z_{\rm bias}&=&\frac{1}{N}\sum_{i=1}^{N}\left( z_{\rm phot}^{i}-z_{\rm spec}^{i}\right) ~, \\

847: \sigma^2&=&\frac{1}{N}\sum_{i=1}^{N}\left(z_{\rm phot}^{i}-z_{\rm spec}^{i}\right)^2 ~.

848: \end{eqnarray}

849:

850: \noindent The third performance metric, denoted by $\sigma_{68}$, is

851: the range containing $68\%$ of the validation set objects in the distribution of

852: $\delta z = z_{\rm phot}-z_{\rm spec}$. This metric is useful because

853: the probability distribution function

854: $P(\delta z)$ is in general non-Gaussian and asymmetric (for a Gaussian

855: distribution, $\sigma$ and $\sigma_{68}$ coincide). Explicitly, $\sigma_{68}$ is

856: defined by the value of $|z_{\rm phot} - z_{\rm spec}|$ such that 68\% of the objects have $|z_{\rm phot} - z_{\rm spec}| < \sigma_{68}$.

857: We also use the $95\%$ region $\sigma_{95}$, defined similarly.

858: In addition to these global metrics, we also define local versions of them

859: in bins of redshift or magnitude.

860:

861: \begin{deluxetable}{llcc}

862: \tablewidth{0pt}

863: \tablecaption{Summary of ANN cases}

864: \startdata

865: \hline

866: \hline

867: \multicolumn{1}{c}{Case} & \multicolumn{1}{c}{Inputs/Description} & \multicolumn{1}{c}{$\sigma$} & \multicolumn{1}{c}{$\sigma_{68}$}\\

868: \hline

869: O1& $ugriz$                                    &0.0525 & 0.0229\\

870: C1& $ugriz$ + $c_uc_gc_rc_ic_z$                &0.0519 & 0.0224\\

871: D1& $ugriz$ + $c_uc_gc_rc_ic_z$. Split training&0.0519 & 0.0209\\

872: CC1&$u-g$, $g-r$, $r-i$, $i-z$                 &0.0668 & 0.0272\\

873: CC2&$u-g$, $g-r$, $r-i$, $i-z$ + $c_gc_rc_i$   &0.0593 & 0.0245\\

874: \enddata

875: \label{table:method}

876: \tablecomments{Photo-z performance metrics $\sigma$ and $\sigma_{68}$

877: for the validation set using different input parameters

878: (magnitudes, colors, and concentration indices) and training procedures.}

879: \end{deluxetable}

880:

881: To search for an optimal photo-z estimator, we computed

882: photo-z's using the ANN method with

883: different combinations of input photometric observables. Five of

884: these combinations are listed in Table \ref{table:method}.

885: In the first case, dubbed O1, the training and photo-z estimation

886: are carried out using only the five magnitudes $ugriz$. In case C1,

887: we use the five magnitudes and the five concentration indices

888: $c_uc_gc_rc_ic_z$ as the input parameters. In case CC1, we

889: use only the four colors

890: $u-g$, $g-r$, $r-i$, and $i-z$. In case CC2, we combine the

891: four colors with

892: the concentration indices $c_gc_rc_i$ in the $gri$ filters.

893: Finally, in case D1, we use the $ugriz$ magnitudes

894: and the $c_uc_gc_rc_ic_z$ concentration indices, but we split the

895: training set and the photometric sample into 5 bins of $r$ magnitude and

896: perform separate ANN fits in each bin.

897: In all five cases, we use an ANN with three hidden layers and tune

898: the number of hidden nodes to keep the total

899: number of degrees of freedom of the network roughly the same for all cases.

900:

901:

902: Table~\ref{table:method} provides a summary of the performance results of the

903: different ANN cases.

904: We find that using concentration indices in addition to magnitudes

905: (C1 vs. O1) helps break some degeneracies and reduces the

906: photo-z scatter by a few percent.

907: Using only colors (CC1) degrades the photo-z performance by as much as 20\%,

908: mostly because the degeneracy between intrinsically red, nearby galaxies

909: and intrinsically blue, distant galaxies (with red observed colors)

910: cannot be broken.

911: Adding concentration indices to color-only training (CC2)

912: helps break such a degeneracy, because the concentration index correlates

913: with galaxy type and hence intrinsic color. Of the five,

914: case CC2 also yields the most realistic photometric redshift

915: distribution for the photometric sample (see \S \ref{subsec:red_dist}).

916: Finally, splitting the training set and photometric sample into

917: magnitude bins (D1) produces

918: results with the best performance metrics ($\sigma$ and $\sigma_{68}$) of

919: all the ANN cases we have tested.

920: We choose D1 and CC2 as the best ANN cases and describe their

921: results in more detail below; their outputs for the photometric sample

922: are included in the public DR6 database.

923:

924: In Fig.~\ref{zpzs_valid_all}, we plot photometric redshift, \zphot,

925: for all objects in the validation set vs. true

926: spectroscopic redshift, \zspec, for the different photo-z methods

927: and cases and in different ranges of $r$ magnitude.

928: The top row shows results for ANN case D1, the middle row shows

929: the performance of ANN case CC2, and the bottom row shows results for

930: the NNP method using magnitudes and concentration indices as the input

931: parameters. In each panel,

932: the values of the corresponding global

933: photo-z performance metrics $\sigma$ and $\sigma_{68}$ are shown.

934: The redshift bias $z_{\rm bias}$ is typically much smaller than $\sigma$ or

935: $\sigma_{68}$, since the photo-z methods are designed to minimize it (see

936: Fig. \ref{plot:statvsm}). In each panel of Fig. \ref{zpzs_valid_all},

937: the solid line traces

938: $z_{\rm phot}=z_{\rm spec}$, i.e., the line

939: for a perfect photo-z estimator.

940: The dashed and dotted lines show the corresponding $68\%$ and $95\%$ regions,

941: defined as above but in $z_{\rm spec}$ bins. Although

942: each photo-z method probes the

943: hypersurface defined by the photometric observables and redshift in a different

944: way,

945: they produce very similar results, suggesting that our results are

946: limited not by the photo-z technique employed but by the

947: intrinsic degeneracies in magnitude-concentration-redshift space and

948: by the photometric errors.

949:

950: \begin{figure}

951:   \resizebox{85mm}{!}{\includegraphics[angle=0]{f5.eps}}

952:  \caption{The performance metrics

953:    $z_{\rm bias}$, $\sigma$, and $\sigma_{68}$ for the ANN D1 and CC2

954:    validation sets are shown

955:    as a function of $r$ magnitude.

956:    CC2 performs relatively poorly for bright objects ($r < 16$), where the color-redshift

957:    relation is contaminated by faint objects with similar colors.  In D1,

958:    this problem is alleviated by the effective magnitude prior imposed by

959:    the training set. At faint magnitudes, the performance degrades as the photometric

960:    errors increase.

961:  }

962:  \label{plot:statvsm}

963: \end{figure}

964:

965:

966: \begin{figure*}

967:   \begin{center}

968:     \begin{minipage}[t]{81mm}

969:       \begin{center}

970:       \resizebox{81mm}{!}{\includegraphics[angle=0]{f6a.c.eps}}

971:       \end{center}

972:     \end{minipage}

973:     \begin{minipage}[t]{81mm}

974:       \begin{center}

975:       \resizebox{81mm}{!}{\includegraphics[angle=0]{f6b.c.eps}}

976:       \end{center}

977:     \end{minipage}

978:   \end{center}

979:   \caption{Performance metrics

980: $z_{\rm bias}$, $\sigma$, and $\sigma_{68}$ for the ANN D1 and CC2 validation sets

981:     are shown as a function of $z_{\rm spec}$ for $r<20$ and $r>20$.

982:     The increased scatter for  objects with $z > 0.6$ is due to

983:     the 4000 \AA \ break shifting out of the $r$ passband at

984:     around $z = 0.7$; beyond that redshift, the estimator effectively relies

985:     on only two passbands ($i$ and $z$) to determine the photo-z's. Note that

986:     faint objects ($r > 20$) have worse scatter at low redshifts for

987:     both cases.  This is likely due to the fact that the faint, low-redshift

988:     objects in the validation set are predominantly blue

989:     dwarf or irregular galaxies that do not have

990:     strong 4000 \AA \ breaks; in this case, the photo-z estimator must rely on less

991:     pronounced spectral features, resulting in larger photo-z scatter.

992:   }

993:   \label{plot:statvsz}

994: \end{figure*}

995:

996: \begin{figure*}

997:   \begin{center}

998:     \begin{minipage}[t]{81mm}

999:       \begin{center}

1000: 	 \resizebox{81mm}{!}{\includegraphics[angle=0]{f7a.c.eps}}

1001: % BW      \resizebox{81mm}{!}{\includegraphics[angle=0]{plots/grVSz.rl20.ps}}

1002:       \end{center}

1003:     \end{minipage}

1004:     \begin{minipage}[t]{81mm}

1005:       \begin{center}

1006: 	 \resizebox{81mm}{!}{\includegraphics[angle=0]{f7b.c.eps}}

1007: % BW      \resizebox{81mm}{!}{\includegraphics[angle=0]{plots/grVSz.rg20.ps}}

1008:       \end{center}

1009:     \end{minipage}

1010:   \end{center}

1011:   \caption{

1012:     $g-r$ color vs spectroscopic redshift for galaxies in the

1013:     validation set: {\it left panel:} galaxies with $r<20$; {\it right panel:}

1014:     galaxies with $r>20$. The solid curves show expected color-redshift relations of

1015:     galaxies with different SED types, calculated using the \cite{col80}

1016:     spectral templates. The different

1017:     colors (shades of grey)

1018: %BW: symbol and greyscale types

1019:     indicate galaxies from the different spectroscopic surveys contributing

1020:     to the validation set. The 2SLAQ objects, denoted by red triangles, were

1021:     selected to be mostly early-type galaxies. They are

1022:     responsible for the minimum in $\sigma$ vs. $z_{spec}$

1023:     for the $r>20$ subsample in Fig. \ref{plot:statvsz}.

1024:   }

1025:   \label{plot:grvsz}

1026: \end{figure*}

1027:

1028:

1029: In Figs. \ref{plot:statvsm} and \ref{plot:statvsz}, we show the performance

1030: metrics

1031: $z_{\rm bias}$, $\sigma$, and $\sigma_{68}$ as a function of $r$ magnitude

1032: and $z_{\rm spec}$ for the validation set for the two preferred ANN cases.

1033: We see that the photo-z precision degrades considerably

1034: for objects with $r > 20$.

1035: This increased scatter is expected, since the relative photometric errors

1036: increase as the nominal detection limit of the SDSS photometry is approached

1037: (see Table \ref{propphot}). While the bias for CC2 increases at $r<17$,

1038: we note that the fraction of objects in the photometric sample which are

1039: that bright is very small.

1040: As a function of redshift, $\sigma$ and $\sigma_{68}$ increase dramatically

1041: beyond $z \sim 0.6$

1042: for the validation set.

1043: For the $r < 20$ part of the sample, the number of spectroscopic objects with

1044: $z > 0.6$ is simply too small

1045: to characterize the redshift-magnitude surface, as shown in

1046: the left panel of Fig. \ref{plot:grvsz}. For the

1047: faint objects ($r > 20$), the scatter is low for $z$ between 0.4 and

1048: 0.6 and increases outside of that range.

1049: It's important to note that the photo-z performance metrics were

1050: calculated independently of spectral type.

1051: Since the the neural network and the training set were not optimized

1052: for any specific galaxy population (e.g., galaxies in clusters) it is possible

1053: that certain galaxy types may have photo-z's with worse (or better!)

1054: biases and dispersion.

1055:

1056:

1057: In Figure~\ref{plot:grvsz}, we plot $g-r$ color versus spectroscopic

1058: redshift for the validation set for both bright ($r<20$) and faint ($r>20$) galaxies.

1059: The 2SLAQ and DEEP2 galaxies are highlighted by different

1060: colors (shades of grey),

1061: %BW: shades of grey,

1062: and the expected color-redshift relations for the four spectral templates from

1063: \cite{col80}

1064: (from early to late types) are indicated by the solid lines.

1065: We see that for the faint sample,  in the range $0.4 < z < 0.6$, the galaxies

1066: come mostly from the 2SLAQ survey, which used

1067: specific color cuts to select early-type galaxies at

1068: $z\sim0.5$.  Because early-type galaxies have a well-defined

1069: 4000 \AA \ break feature, their photo-z's are well determined and

1070: their photo-z scatter is low.

1071: Outside of the range $0.4 < z < 0.6$, the validation set at faint magnitudes

1072: is dominated by bluer galaxies

1073: that do not have strong, broad spectral features, resulting in the

1074: larger photo-z scatter seen in Fig. \ref{plot:statvsz}.

1075:

1076: Fig.~\ref{plot:statvsz} shows that the common assumption that the

1077: photo-z scatter

1078: scales as $(1+z)$ is not consistent with our estimates for the SDSS sample.

1079: The functional form of the scatter versus redshift depends

1080: strongly on the underlying galaxy type distribution.

1081:

1082: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1083: \subsection{Redshift Distributions}

1084: \label{subsec:red_dist}

1085: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1086:

1087: So far, we have considered the scatter and bias of photo-z estimates.

1088: As discussed in \S \ref{estdist}, it is also of interest to consider

1089: the predicted photo-z distribution as a whole. Different photo-z estimators

1090: may achieve similar values for the metrics $z_{\rm bias}$, $\sigma$, and $\sigma_{68}$,

1091: but predict different forms for the photo-z distribution of the photometric

1092: sample. As we shall see, this is the case with the two ANN cases D1 and CC2.

1093: We therefore define two additional performance metrics to quantify the

1094: quality of the predicted photo-z distribution.

1095: The first metric, $\sigma_{\rm dist}$, measures the {\it rms} difference between

1096: the binned $z_{\rm phot}$ and $z_{\rm spec}$ distributions of the validation set,

1097:

1098: \begin{eqnarray}

1099: \sigma^2_{\rm dist}&=&\frac{1}{N_{\rm bin}}\sum_{i=1}^{N_{\rm bin}}\left(P_{\rm phot}^{i}-P_{\rm spec}^{i}\right)^2,

1100: \end{eqnarray}

1101:

1102: \noindent where $P_{\rm phot}^{i}$ is the height of the

1103: $i^{\rm th}$ redshift bin of the $z_{\rm phot}$ distribution,

1104: $P_{\rm spec}^{i}$ is the height of the same redshift

1105: bin of the $z_{\rm spec}$ distribution, and $N_{\rm bin}$ is the total number

1106: of redshift bins used.

1107: Here we use $N_{\rm bin}=120$ equally spaced redshift bins running

1108: from $z=0$ to $z=1.2$.

1109:

1110:

1111: The second redshift distribution

1112: metric we employ is the KS statistic $D$, the

1113: maximum value of the absolute difference between the two ($z_{\rm phot}$ and

1114: $z_{\rm spec}$) cumulative

1115: redshift distribution

1116: functions. An advantage of the KS statistic is that it does not require

1117: binning the data in redshift. However, our

1118: use of the KS statistic to quantify the difference between the $z_{\rm phot}$

1119: and $z_{\rm spec}$ distributions of the validation set likely does

1120: not adhere to formal statistical practice,

1121: since it turn outs that the probability for the KS statistic for both cases we consider

1122: is very close to zero \citep{pre92}.

1123:

1124: Table \ref{table_sigdist_ks} shows the values of

1125: $\sigma_{\rm dist}$ and of the KS statistic $D$ for the validation set for the

1126: D1 and CC2 ANN photo-z's, for different ranges of $r$ magnitude.

1127: Although the CC2 photo-z distribution is

1128: a worse overall match to the $z_{\rm spec}$ distribution for the

1129: validation set, it works better than D1 for $r>18$.

1130: Since the photometric sample

1131: is dominated by objects at $r>20$ (see Fig. \ref{dist.sdss}),

1132: these results suggest that CC2 should do a better job in

1133: estimating the redshift distribution of the photometric sample,

1134: even though D1 performs better by the standards of $z_{\rm bias}$ and

1135: $\sigma$.

1136:

1137:

1138: \begin{deluxetable}{cc|cc|ccc}

1139: \tablewidth{0pt}

1140: \tablecaption{$\sigma_{\rm dist}$ and KS statistic for Redshift distribution}

1141: \startdata

1142: \hline

1143: \hline

1144: \multicolumn{1}{c}{} & \multicolumn{1}{c|}{} & \multicolumn{2}{c|}{$\sigma_{\rm dist}$} & \multicolumn{2}{c}{KS statistic}\\

1145: \hline

1146: \multicolumn{1}{c}{} & \multicolumn{1}{c|}{$r$-mag bin} & \multicolumn{1}{c}{CC2} & \multicolumn{1}{c|}{D1} & \multicolumn{1}{c}{CC2} & \multicolumn{1}{c}{D1}\\

1147: \hline

1148: &$r < 18$ & 0.0392 & 0.0330 & 0.0632  & 0.0391& \\

1149: &$18<r<19$& 0.0390 & 0.0430 & 0.0520  & 0.0533& \\

1150: &$19<r<20$& 0.0391 & 0.0399 & 0.0366  & 0.0413&\\

1151: &$20<r<21$& 0.0403 & 0.0471 & 0.0363  & 0.0665&\\

1152: &$21<r<22$& 0.0652 & 0.0702 & 0.1051  & 0.1306&\\

1153: \hline

1154: &All & 0.0383 & 0.0338 & 0.0485 & 0.0307&

1155: \enddata

1156: \label{table_sigdist_ks}

1157: \tablecomments{$\sigma_{\rm dist}$ and KS statistic results for CC2 and D1 ANN photo-z's for the validation set.}

1158: \end{deluxetable}

1159:

1160:

1161: The redshift distributions for the validation set are shown in

1162: Fig.~\ref{dndz.valid} for the same bins of $r$ magnitude as in

1163: Table \ref{table_sigdist_ks}.

1164: The D1 and CC2 \zphot \ distributions are shown

1165: in color,

1166: % BW: shaded,

1167: and the solid curves correspond to the \zspec \ distributions.

1168: The similarities between the \zphot \ and \zspec \ distributions

1169: are consistent with the results of

1170: Table \ref{table_sigdist_ks}.

1171:

1172: In \S \ref{estdist}, we noted that the \zspec \ distribution of the

1173: spectroscopic sample, weighted to reproduce the color and magnitude

1174: distributions of the photometric sample, provides an estimate of the

1175: unknown redshift distribution of the photometric sample. The \zphot \

1176: distribution for the photometric sample, computed using ANN D1 or CC2, provides

1177: another estimate of the true redshift distribution for the photometric

1178: sample, but one that we know suffers from bias (e.g., Fig. \ref{plot:statvsm}).

1179: While we have not shown that the weighted \zspec \ estimate of the

1180: redshift distribution is unbiased, it has the advantage that it makes

1181: direct use of the statistical properties of the photometric sample, and

1182: we believe it is our best estimate of the photometric sample redshift distribution.

1183: Our final test of photo-z performance therefore compares the \zphot

1184: \ distribution for the photometric sample for the two ANN cases

1185: with the weighted \zspec \ distribution of the spectroscopic sample.

1186: Agreement between the weighted \zspec \ distribution and either one of the

1187: \zphot \ distributions does not guarantee that they are correct, but

1188: it at least provides a useful consistency check.

1189:

1190: In Fig.~\ref{dndz.photo} we show the estimated redshift distributions of a

1191: random subsample containing $\sim 1\%$ of the objects in the DR6

1192: photometric sample for both the CC2 and D1 ANN cases.

1193: The

1194: % BW: filled curves

1195: colored regions

1196: correspond to the \zphot \ distributions, and the solid lines indicate

1197: the weighted \zspec \ distribution of the spectroscopic sample.

1198: The \zphot \ distributions for CC2 are closer matches to

1199: the weighted  \zspec \ distributions for $r>18$, and they do

1200: not show the peculiar features that the D1 photo-z distributions

1201: display, particularly at faint magnitudes. By the criterion of

1202: producing a more realistic redshift distribution for the photometric

1203: sample, the CC2 ANN estimator is preferred.

1204:

1205: \subsection{Photo-z Errors}

1206:

1207: In order to test the quality of our photo-z error estimates

1208: calculated with the NNE method, we introduce the concept of

1209: empirical error. For a set of objects (within the validation set) with similar

1210: NNE error,

1211: $\sigma_{z}^{\rm NNE}$, the empirical error is defined as the $68\%$

1212: width of the $|z_{\rm phot}-z_{\rm spec}|$ distribution for the set.

1213: If the NNE estimator works properly,

1214: objects with similar NNE error should have similar underlying

1215: error distributions, i.e.,

1216: the NNE error should correlate

1217: well with the empirical error.

1218:

1219: Fig.~\ref{erer} shows the performance of the photo-z error estimator

1220: by plotting the computed NNE error $\sigma_{z}^{\rm NNE}$ as a function

1221: of the corresponding empirical error for the validation set.

1222: Results are shown for the D1 and CC2 ANN photo-z's.

1223: The empirical error was calculated for bins containing $100$ objects

1224: with similar $\sigma_z^{\rm NNE}$.

1225: As expected, faint objects ($r > 20$) have larger errors than bright

1226: objects ($r < 20$).

1227: The NNE estimated error correlates well with the

1228: empirical error even for the faint objects, indicating that the

1229: error estimator works properly for all magnitudes.

1230: The bulk of the bright objects have $\sigma_z^{\rm NNE}$ in the range

1231: $0.01-0.04$, consistent with the overall {\it rms} photo-z scatter of

1232: $\sigma \sim 0.03$ indicated in Fig \ref{zpzs_valid_all}.

1233: Likewise, faint objects have $\sigma_z^{\rm NNE}$ in the range $0.02-0.3$,

1234: while $\sigma \sim 0.13$ for those objects.

1235: The NNE error is therefore a robust indicator of an object's

1236: photo-z quality. In particular, we have carried out tests in which we

1237: cut objects with large NNE error from the sample and found that the

1238: remaining sample has smaller photo-z scatter and fewer catastrophic

1239: outliers. For applications in which

1240: photo-z precision is more important than

1241: completeness of the photometric sample, this can be a

1242: useful procedure.

1243:

1244: \begin{figure*}

1245:   \begin{center}

1246:     \begin{minipage}[t]{81mm}

1247:       \begin{center}

1248:       \resizebox{81mm}{!}{\includegraphics[angle=0]{f8a.c.eps}}

1249:       \end{center}

1250:     \end{minipage}

1251:     \begin{minipage}[t]{81mm}

1252:       \begin{center}

1253:       \resizebox{81mm}{!}{\includegraphics[angle=0]{f8b.c.eps}}

1254:       \end{center}

1255:     \end{minipage}

1256:   \end{center}

1257:   \caption{Redshift distributions for the galaxies in the

1258:     validation set for different $r$ magnitude bins. {\it Left panels:} ANN D1;

1259:     {\it right panels:} ANN CC2.

1260:     The

1261: % BW: solidly

1262:     colored regions indicate the ANN

1263:     photo-z distributions, while the lines are

1264:     the spectroscopic redshift distributions. By eye,

1265:     both ANN cases recover the true redshift distributions of the

1266:     validation set well, except

1267:     in the faintest magnitude bin, where the photometric errors become large.

1268: }\label{dndz.valid}

1269: \end{figure*}

1270:

1271: \begin{figure*}

1272:   \begin{center}

1273:     \begin{minipage}[t]{81mm}

1274:       \begin{center}

1275:       \resizebox{81mm}{!}{\includegraphics[angle=0]{f9a.c.eps}}

1276:       \end{center}

1277:     \end{minipage}

1278:     \begin{minipage}[t]{81mm}

1279:       \begin{center}

1280:       \resizebox{81mm}{!}{\includegraphics[angle=0]{f9b.c.eps}}

1281:       \end{center}

1282:     \end{minipage}

1283:   \end{center}

1284:   \caption{Estimated redshift distributions for a random subsample of

1285:     1\% of the galaxies in the

1286:     DR6 photometric sample in different $r$-magnitude bins. {\it Left panels:}

1287:     ANN D1; {\it right panels:} ANN CC2. Colors show the \zphot \ distributions.

1288:     The lines show the estimated redshift distributions from the spectroscopic

1289:     sample weighted to match the magnitude and color distributions of the

1290:     photometric sample.

1291:     Even though the two ANN cases correctly recover the

1292:     validation set redshift distribution (Fig. \ref{dndz.valid}),

1293:     their photo-z

1294:     distributions for the photometric sample disagree. The photo-z distribution

1295:     for D1 shows a peak at

1296:     $z\sim0.4$ that results mainly from the $20 < r < 21$ bin.

1297:     The CC2 distribution does not show such strong features, and in general it matches

1298:     the weighted \zspec \ distribution better.

1299: }\label{dndz.photo}

1300: \end{figure*}

1301:

1302:

1303:

1304: \begin{figure*}

1305:   \begin{center}

1306:     \begin{minipage}[t]{81mm}

1307:       \begin{center}

1308:       \resizebox{81mm}{!}{\includegraphics[angle=0]{f10a.c.eps}}

1309:       \end{center}

1310:     \end{minipage}

1311:     \begin{minipage}[t]{81mm}

1312:       \begin{center}

1313:       \resizebox{81mm}{!}{\includegraphics[angle=0]{f10b.c.eps}}

1314:       \end{center}

1315:     \end{minipage}

1316:   \end{center}

1317:  \caption{The estimated error from the NNE method, $\sigma_z^{\rm NNE}$, is

1318:    shown against the empirical error for objects in the validation set.

1319:    {\it Left panel:} D1 ANN; {\it right panel:} CC2 ANN.

1320:    Each point corresponds to a bin

1321:    of $100$ objects with similar $\sigma_z^{\rm NNE}$.

1322:    The black squares show results for bright objects ($r < 20$),

1323:    the red triangles for faint objects ($r > 20$). As expected, faint

1324:    objects have larger errors, but

1325:    the NNE error correlates well with the empirical error over the full magnitude range.

1326:  }\label{erer}

1327: \end{figure*}

1328:

1329:

1330:

1331: In Fig.~\ref{gausser}, we plot the normalized error distribution,

1332: i.e., the distribution

1333: of $(z_{\rm phot}-z_{\rm spec})/\sigma_{z}^{\rm NNE}$, for objects

1334: in the spectroscopic sample, using the D1 ANN estimator.

1335: The solid black lines are the data, and the dotted red lines

1336: show Gaussian distributions with zero mean and unit variance.

1337: The upper panels show results for the galaxies in the SDSS Main

1338: and LRG spectroscopic samples. The lower panels show results for

1339: all validation-set galaxies, divided into bright

1340: ($r < 20$) and faint ($r > 20$) samples.

1341: These plots indicate that, averaged over the bulk of the spectroscopic

1342: sample, the photo-z estimates are nearly unbiased, the NNE error

1343: provides a good estimate of the true error, and the NNE error can be

1344: approximately interpreted as a Gaussian error in this average sense.

1345: Note that this does {\it not} imply that the photo-z error distributions in

1346: bins of magnitude or redshift are unbiased Gaussians: Figs. \ref{plot:statvsm}

1347: and \ref{plot:statvsz} show that they are not.

1348:

1349: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1350: \section{Query Flags and Caveats} \label{rec}

1351: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1352:

1353: When querying the SDSS data server to produce the photometric sample for

1354: which we estimated photo-z's, we set the most relevant flags needed to

1355: produce a clean galaxy sample.

1356: However, some applications may require more stringent selection of objects.

1357: We advise users of the catalog to read the documentation about producing a clean

1358: galaxy sample on the SDSS

1359: website\footnote{ {\tt http://cas.sdss.org/dr6/en/help/docs/algorithm.asp} }.

1360: In particular, users should consider requiring the BINNED1 (object detected at $> 5\sigma$) flag and removing

1361: objects with the NODEBLEND (object is a blend but deblending was not possible) flag. The various PHOTO flags

1362: are described in more details at the above

1363: website as well as in Appendix \ref{query}.

1364:

1365: Finally, we note that the training of the photo-z estimators included only

1366: galaxies, not stars. As a result, photo-z estimates for

1367: stars that contaminate the photometric sample will be wrong, and cutting

1368: objects with low $z_{\rm phot}$ will not remove them. Our tests on

1369: star/galaxy separation in the photometric sample are briefly

1370: described in Appendix \ref{stargal}.

1371:

1372: \begin{figure}

1373:   \begin{center}

1374:     \begin{minipage}[t]{81mm}

1375:       \begin{center}

1376:       \resizebox{81mm}{!}{\includegraphics[angle=0]{f11.c.eps}}

1377:       \end{center}

1378:     \end{minipage}

1379:   \end{center}

1380:  \caption{

1381:    Distributions of

1382:    $(z_{\rm phot}-z_{\rm spec})/\sigma_{z}^{\rm NNE}$

1383:    for objects in the spectroscopic sample, with photo-z's calculated

1384:    using ANN D1; the

1385:    results for ANN CC2 are very similar.

1386:    The solid black lines are the data, and the dotted red lines are

1387:    Gaussians with zero mean and unit variance.  {\it Top left:} SDSS Main

1388:    spectroscopic sample; {\it top right:} SDSS LRG sample; {\it bottom

1389:      left:} validation-set galaxies with $r<20$; {\it bottom right:} validation-set

1390:    galaxies with $r>20$. In all cases the photo-z errors

1391:    are reasonably well modeled by Gaussian distributions.

1392:  }\label{gausser}

1393: \end{figure}

1394:

1395:

1396: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1397: \section{Accessing the Catalog} \label{cat}

1398: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1399:

1400: The photo-z catalog can be accessed from the

1401: {\tt photoz2} table in the DR6 context on the

1402: SDSS CasJobs site, at {\tt http://casjobs.sdss.org/casjobs/}.

1403: A query similar to the one in the Appendix provides all objects

1404: for which we computed photo-z's.

1405: Alternatively, one can simply perform a query that searches for

1406: objects with a {\tt photoz2} entry.

1407:

1408: In addition to the {\tt photoz2} table in the SDSS CAS, an independent

1409: {\tt photoz} table is also available, for which the photo-z's

1410: have been computed using a template-based technique; see

1411: \cite{csa07, ade07}.

1412:

1413:

1414: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1415: \section{Conclusions}\label{con}

1416: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1417: We have presented a public catalog of photometric redshifts for the SDSS DR6

1418: photometric sample using

1419: two different photo-z estimates, CC2 and D1, based on the ANN method.

1420: As a consistency check, we have also calculated photo-z's using the NNP method,

1421: a nearest neighbor approach, which gives very good agreement with

1422: the ANN results.

1423: The CC2 and D1 photo-z results are comparable. For the validation set, the

1424: D1 photo-z estimates have lower photo-z scatter for bright galaxies ($r<20$),

1425: and scatter similar to but slightly smaller than that of

1426: CC2 for objects with $r>20$. Our tests indicate

1427: that the SDSS photo-z estimates are most reliable for galaxies

1428: with $r<20$

1429: and that the scatter increases significantly at fainter magnitudes.

1430: For faint galaxies ($r>20$), we recommend using the CC2 photo-z estimate,

1431: since the CC2 \zphot \ distribution most closely resembles the \zspec \

1432: distribution for the validation set and the weighted \zspec \ estimate

1433: for the redshift distribution of the photometric sample.

1434: For users who wish to use, for simplicity, a single photo-z estimator

1435: over the full

1436: magnitude range, we recommend using CC2.

1437:

1438: Finally, we have demonstrated that the NNE error estimator, included in the

1439: public catalog,

1440: provides a reliable measure of the photo-z errors and that the overall scaled

1441: photo-z errors are nearly Gaussian.

1442:

1443: Funding for the DEEP2 survey has been provided by NSF grant AST-0071048 and AST-0071198. The data presented herein were obtained at the W.M. Keck Observatory, which is operated as a scientific partnership among the California Institute of Technology, the University of California and the National Aeronautics and Space Administration. The Observatory was made possible by the generous financial support of the W.M. Keck Foundation. The DEEP2 team and Keck Observatory acknowledge the very significant cultural role and reverence that the summit of Mauna Kea has always had within the indigenous Hawaiian community and appreciate the opportunity to conduct observations from this mountain.

1444:

1445: Funding for the SDSS and SDSS-II has been provided by the Alfred P. Sloan Foundation, the Participating Institutions, the National Science Foundation, the U.S. Department of Energy, the National Aeronautics and Space Administration, the Japanese Monbukagakusho, the Max Planck Society, and the Higher Education Funding Council for England. The SDSS Web Site is {\tt http://www.sdss.org/}.

1446:

1447: The SDSS is managed by the Astrophysical Research Consortium for the Participating Institutions. The Participating Institutions are the American Museum of Natural History, Astrophysical Institute Potsdam, University of Basel, University of Cambridge, Case Western Reserve University, University of Chicago, Drexel University, Fermilab, the Institute for Advanced Study, the Japan Participation Group, Johns Hopkins University, the Joint Institute for Nuclear Astrophysics, the Kavli Institute for Particle Astrophysics and Cosmology, the Korean Scientist Group, the Chinese Academy of Sciences (LAMOST), Los Alamos National Laboratory, the Max-Planck-Institute for Astronomy (MPIA), the Max-Planck-Institute for Astrophysics (MPA), New Mexico State University, Ohio State University, University of Pittsburgh, University of Portsmouth, Princeton University, the United States Naval Observatory, and the University of Washington.

1448:

1449: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1450: \appendix

1451: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1452:

1453: \section{Data Query Code}\label{query}

1454:

1455: Here we provide the SDSS database query used to obtain part of the catalog containing

1456: the photometric sample used in this paper.

1457: Notice that the query requires the TYPE flag to be set to 3 (galaxies) and

1458: selects objects with dereddened model magnitude  $r<22.0$ to reflect

1459: the SDSS nominal detection limit.

1460: The query to obtain objects with Right Ascension (RA) in the

1461: range $[0,170)$ is

1462:

1463: \vspace{0.8 cm}

1464:

1465: {\tt

1466: declare @BRIGHT bigint set @BRIGHT=dbo.fPhotoFlags('BRIGHT')

1467:

1468: declare @SATURATED bigint set @SATURATED=dbo.fPhotoFlags('SATURATED')

1469:

1470: declare @SATUR\_CENTER bigint set @SATUR\_CENTER=dbo.fPhotoFlags('SATUR\_CENTER')

1471: \vspace{0.5 cm}

1472:

1473: declare @bad\_flags bigint set @bad\_flags=(@SATURATED|@SATUR\_CENTER|@BRIGHT)

1474: \vspace{0.5 cm}

1475:

1476: select

1477:

1478: objID, ra, dec,type,dered\_u,dered\_g,dered\_r,dered\_i,dered\_z,

1479:

1480: petroR50\_u, petroR50\_g, petroR50\_r, petroR50\_i, petroR50\_z,

1481:

1482: petroR90\_u, petroR90\_g, petroR90\_r, petroR90\_i, petroR90\_z

1483:

1484:

1485:

1486: \vspace{0.5 cm}

1487:

1488:

1489: into MyDb.all\_ra\_0\_170

1490:

1491: FROM PhotoPrimary

1492:

1493: WHERE ((flags \& @bad\_flags)) = 0 AND (dered\_r<=22.0) AND (ra>=0.0) AND (ra<170.0)

1494:

1495: AND (type = 3)

1496:

1497: }

1498:

1499: \vspace{0.5cm}

1500:

1501: Here we provide a brief description of the flags used in the query:

1502: BRIGHT indicates that an object is a duplicate detection of an object with

1503: signal to noise greater

1504: than $200 \sigma$; SATURATED indicates that an

1505: object contains one or more saturated pixels;

1506: SATUR\_CENTER indicates that the object center is close to at least one

1507: saturated pixel.

1508: Note that in selecting PRIMARY objects (using PhotoPrimary),

1509: we have implicitly selected objects

1510: that either do {\it not} have the BLENDED flag set

1511: or else have NODEBLEND set or nchild equal zero.

1512: In addition, the PRIMARY catalog contains no BRIGHT objects, so

1513: the cut on BRIGHT objects in the query above is in fact redundant.

1514: BLENDED objects have multiple peaks detected within them, which PHOTO

1515: attempts to deblend into several CHILD objects.

1516: NODEBLEND objects are BLENDED but no deblending was attempted on them, because

1517: they are either too close to an EDGE, or too large, or one of

1518: their children overlaps an edge. A few percent of the objects in

1519: our photometric sample have NODEBLEND set; some users may wish to

1520: remove them.

1521:

1522: We also suggest that users require objects to have the

1523: BINNED1 flag set.

1524: BINNED1 objects were detected at $\geq 5 \sigma$ significance

1525: in the original imaging frame.

1526:

1527: The SDSS webpage\footnote{\tt{http://cas.sdss.org/dr5/en/help/docs/algorithm.asp?key=flags}} provides

1528: further recommendations about flags, which we strongly recommend that users read.

1529:

1530: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1531: \section{Tests on star-galaxy separation}\label{stargal}

1532: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1533:

1534: We used the SDSS database TYPE flag to select the galaxy

1535: photometric sample for our photo-z catalogs. To study the robustness

1536: of the TYPE flag in separating galaxies from stars, we also

1537: carried out tests using an independent star-galaxy

1538: classifier.

1539: Here we briefly describe both of these techniques and show the results

1540: obtained on photometric and  spectroscopic samples.

1541:

1542: The TYPE flag is based on the star-galaxy separator in the SDSS PHOTO

1543: pipeline,

1544: described in \cite{lup01} and updated in \cite{aba04}.

1545: For a given object, the pipeline computes the PSF and cmodel

1546: magnitudes in each passband\footnote{http://www.sdss.org/dr5/algorithms/photometry.html},

1547: where the cmodel magnitude is a measure of the flux using a

1548: composite of the best-fit de Vaucouleurs and exponential models of

1549: the light profile.  If the condition

1550:

1551: \begin{equation}

1552: m_{PSF}-m_{cmodel} > 0.145

1553: \end{equation}

1554:

1555: \noindent is satisfied, type is set to GALAXY for that band;

1556: otherwise, type is set to STAR.  The object's global TYPE is

1557: determined by the same criterion, but now applied to the

1558: summed PSF and cmodel fluxes from all passbands in

1559: which the object is detected.

1560: \cite{lup01} show that an earlier version of this simple

1561: cut works at the $95\%$ confidence level for SDSS objects brighter

1562: than $r=21$.

1563:

1564: The second star-galaxy separator we tested is the galaxy probability

1565: defined in \cite{scr02}.

1566: The galaxy probability (hereafter $probgals$) is a Bayesian probability estimate that an object

1567: is a galaxy (and not a star), given the object's magnitudes and

1568: concentration parameter. Here

1569: the concentration parameter is {\it not}

1570: the ratio of Petrosian radii but is

1571: defined as the difference between an

1572: object's PSF and exponential-model $r$ magnitudes.

1573: This concentration parameter is close to zero for stars, is positive

1574: for bright galaxies, and approaches zero as galaxies become fainter.

1575:

1576: We conducted some simple tests to compare these classification schemes.

1577: If we set the Bayesian $probgals$ threshold to a value between 0.5 and 0.9,

1578: then both methods agree on the classification of

1579: more than $90\%$ of the objects for

1580: a random 1\% subset of the SDSS photometric sample.

1581: We also tested the methods on a spectroscopic sample of 29,229

1582: galaxies and stars (counting independent photometric

1583: measurements of each object) from the 2SLAQ and DEEP2 catalogs

1584: with $r < 22$.

1585: Defining stars as objects with $z_{\rm spec}<0.01$, the sample

1586: contains 24,541 galaxies and 4,688 stars. We wish to compare

1587: this spectroscopic ``truth table'' with the photometric classification

1588: of the two methods and with a combined method that classifies

1589: an object as a galaxy if and only if both separators classify it as a

1590: galaxy.

1591: For the purposes of this test, we say that

1592: the Bayesian scheme classifies an object as a galaxy if

1593: $probgals>0.5$. We define galaxy

1594: completeness as the ratio of correctly identified

1595: galaxies to the total number of galaxies in the spectroscopic

1596: sample.

1597: Purity is defined as the ratio of correctly identified galaxies

1598: to the number of objects identified (correctly or not) as galaxies by the

1599: classifier. The purity depends in part on the relative numbers of

1600: galaxies and stars in the spectroscopic sample.

1601:

1602:

1603: Fig.~\ref{compur} shows the completeness and purity of the

1604: resulting galaxy catalogs in bins of $r$

1605: magnitude for this spectroscopic sample.

1606: Overall, the Bayesian separator and PHOTO TYPE

1607: produce similar results for galaxy purity and completeness. Moreover,

1608: the agreement between the two classification methods is quite good on

1609: an object-by-object basis.

1610: The

1611: Bayesian separator with {\it probgals} $\geq 0.5$ achieves slightly higher

1612: completeness and slightly lower purity.

1613: By varying the $probgals$ boundary, we could improve the purity of the

1614: Bayesian galaxy sample at the expense of degrading its completeness.

1615: We note that

1616: the best value of $probgals$ to use in defining a galaxy photometric

1617: sample depends on the scientific applications of the sample, i.e.,

1618: on whether completeness or purity is the more important feature.

1619: In statistical applications, instead of defining a galaxy sample one

1620: can also choose to weight objects by

1621: their Bayesian probability \citep{scr02}.

1622:

1623: Based on this test, we conclude that

1624: the photometric sample for which we have estimated photo-z's has

1625: better than 90\% galaxy purity.

1626:

1627: \begin{figure}

1628:   \begin{center}

1629:     \begin{minipage}[t]{81mm}

1630:       \begin{center}

1631:       \resizebox{81mm}{!}{\includegraphics[angle=0]{f12.c.eps}}

1632:       \end{center}

1633:     \end{minipage}

1634:   \end{center}

1635:  \caption{{\it Top panel:} completeness and {\it bottom panel:} purity

1636: for the

1637: Bayesian and PHOTO TYPE galaxy classifications as well as for a combination

1638: of the two, using a sample of galaxies with spectroscopic classification.

1639: Results for the Bayesian separator have the $probgals$ lower bound set

1640: to $0.5$.}

1641: \label{compur}

1642: \end{figure}

1643:

1644:

1645: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1646: \section{Photometric Redshifts for SDSS DR5}

1647: \label{photdr5}

1648: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1649:

1650: An earlier version of the photo-z catalog, produced for SDSS

1651: Data Release 5 (DR5), is publicly

1652: available on the SDSS DR5 website (and is also called

1653: {\tt photoz2}).

1654: The methods used to construct that photo-z catalog were similar to the

1655: ones employed here for DR6, but the latter incorporates a number of

1656: important

1657: improvements. Here we briefly outline the differences between the two.

1658: We {\it strongly} recommend use of the DR6 photo-z catalog instead of the

1659: DR5 catalog.

1660:

1661: The photometric galaxy sample selection has improved from DR5 to DR6,

1662: because we used more stringent cuts in defining the DR6 sample.

1663: The DR6 sample selection is described above in Appendix \ref{query}.

1664: The DR5 photometric galaxy sample selection required

1665: the cmodel and model $r$ magnitudes to lie in the ranges

1666: $r_{\rm cmodel} \in (14.0,22.0)$ and

1667: $r_{\rm model} \in (13.5,22.5)$, and also required the value of

1668: the smear polarizability \citep{she04} to be $m_r>0.8$. Also, for DR5,

1669: star-galaxy separation used the Bayesian estimator (see

1670: Appendix \ref{stargal}) with the value $probgals >0.8$, while for DR6

1671: we used PHOTO TYPE.

1672: The additional cuts used for the DR6 catalog have produced

1673: a cleaner and more reliable galaxy sample.

1674:

1675: \begin{deluxetable}{cccc}

1676: \tablewidth{0pt}

1677: \tablecaption{DR5 Catalog $flag$}

1678: \startdata

1679: \hline

1680: \hline

1681: \multicolumn{1}{c}{$flag$}

1682: & \multicolumn{1}{c}{N\textsuperscript{\b{o}} of Galaxies}

1683: & \multicolumn{1}{c}{Object Description}\\

1684: \hline

1685: - & $86.1$ million                 &All \\

1686: 0 & $12.6$ million               &\hspace{0.075 in}Complete \& bright\\

1687: 1 & $\hspace{0.06 in}0.6$ million &Incomplete \& bright\\

1688: 2 & $59.0$ million               &Complete \& faint \\

1689: 3 & $13.9$ million               &\hspace{-0.075 in}Incomplete \& faint \\

1690: \enddata

1691: \label{tableflags}

1692: \tablecomments{The flag scheme for the DR5 catalog is based on object

1693: detection in some/all passbands and the $r$ magnitude. Incomplete objects are undetected in

1694: at least one of the passbands ($ugriz$) and faint objects have $r>20$.

1695: }

1696: \end{deluxetable}

1697:

1698: \begin{deluxetable}{cccc}

1699: \tablewidth{0pt}

1700: \tablecaption{DR6 Catalog $flag$}

1701: \startdata

1702: \hline

1703: \hline

1704: \multicolumn{1}{c}{$flag$}

1705: & \multicolumn{1}{c}{N\textsuperscript{\b{o}} of Galaxies}

1706: & \multicolumn{1}{c}{Object Description}\\

1707: \hline

1708: - & $77.4$ million                 &All \\

1709: 0 & $11.5$ million               &bright\\

1710: 2 & $65.9$ million               &faint \\

1711: \enddata

1712: \label{tableflags6}

1713: \tablecomments{The $flag$ scheme for the DR6 catalog is based solely on the

1714: on the $r$ magnitude: faint objects have $r>20$.

1715: }

1716: \end{deluxetable}

1717:

1718: The DR5 photo-z catalog included

1719: a number of flags describing the expected photo-z

1720: quality, shown in Table \ref{tableflags}.

1721: These flags were based on the detection or non-detection of the object in

1722: all passbands

1723: and on the value of the $r$ model magnitude. An object was classified

1724: as bright (faint) if $r<20$ ($r>20$). An object was flagged as ``incomplete''

1725: if it was not detected in all five SDSS passbands. Table \ref{tableflags}

1726: shows the corresponding flag values and the number of objects assigned

1727: each flag value. For the DR6 sample, given the stricter

1728: sample selection, a very small number of objects would have been

1729: classified as incomplete by the definition above, and they have

1730: been removed from the sample. As a result, for DR6, we only

1731: supply the bright/faint flag, as shown in Table \ref{tableflags6}.

1732:

1733: The spectroscopic training set used for the DR6 photo-z catalog

1734: has important additions compared to

1735: the one used for the DR5 catalog. In particular,

1736: for DR6 we added the DEEP2 spectroscopic catalog (which became

1737: publicly available), which made the training set more complete

1738: at faint magnitudes.

1739: We also implemented more stringent spectroscopic quality cuts

1740: to the training set used for DR6.

1741:

1742: Unlike the DR5 training set, the DR6 training set does not contain

1743: objects from the SDSS ``special'' plates, extra spectroscopic observations

1744: designed to target specific objects for various scientific studies \citep{ade06}.

1745:  In our tests, we find that

1746: the lack of special plates does not result in any degradation of the

1747: photo-z quality.

1748:

1749: The photo-z algorithm also changed from DR5 to DR6: we increased the

1750: number of hidden-layer nodes in the ANN and we added the concentration

1751: indices to the data inputs.

1752: Our tests indicated that this leads to

1753: improved photo-z performance according to our metrics.

1754: In addition, the CC2 method differs from DR5 photo-z's further in that

1755: CC2 uses only the color information and not the raw magnitudes.

1756: For general purpose, full sample photo-z's, we recommend using CC2

1757: photo-z's over both DR5 and D1 photo-z's.

1758: Finally,

1759: we have carried out more extensive tests of the DR6 photo-z's than

1760: were done for DR5, increasing our confidence in the robustness of

1761: the photo-z estimates.

1762:

1763: \bibliographystyle{apj}

1764: \bibliography{ms}

1765:

1766: \end{document}

1767: