0709:0709.3330/ms.tex

1: \documentclass{emulateapj}

2: %\documentclass[12pt,preprint]{aastex}

3:

4: \newcommand{\etal}{{\it et al.}}

5:

6: \begin{document}

7:

8: \title{Photometric Redshifts and Photometry Errors}

9:

10: \shorttitle{Photometric Redshifts and Photometry Errors}

11:

12: \author{D. Wittman, P. Riechers, and V.~E. Margoniner\altaffilmark{1}}

13: \affil{Physics Department, University of California, Davis,

14:   CA 95616; dwittman@physics.ucdavis.edu}

15: \altaffiltext{1}{Current address: Physics Department, California State

16: University, Sacramento, CA 95819}

17:

18: \keywords{surveys---galaxies: photometry---methods: statistical}

19:

20:

21: \begin{abstract}

22: We examine the impact of non-Gaussian photometry errors on photometric

23: redshift performance.  We find that they greatly increase the scatter,

24: but this can be mitigated to some extent by incorporating the correct

25: noise model into the photometric redshift estimation process.

26: However, the remaining scatter is still equivalent to that of a much

27: shallower survey with Gaussian photometry errors.  We also estimate

28: the impact of non-Gaussian errors on the spectroscopic sample size

29: required to verify the photometric redshift rms scatter to a given

30: precision.  Even with Gaussian {\it photometry} errors, photometric

31: redshift errors are sufficiently non-Gaussian to require an order of

32: magnitude larger sample than simple Gaussian statistics would

33: indicate.  The requirements increase from this baseline if

34: non-Gaussian photometry errors are included.  Again the impact can be

35: mitigated by incorporating the correct noise model, but only to the

36: equivalent of a survey with much larger Gaussian photometry errors.

37: However, these requirements may well be overestimates because they are

38: based on a need to know the rms, which is particularly sensitive to

39: tails.  Other parametrizations of the distribution may

40: require smaller samples.

41: \end{abstract}

42:

43: \section{Introduction}

44:

45: Photometric redshifts (Connolly \etal\ 1995, Hogg \etal\ 1998, Benitez

46: 2000) are of increasing importance in observational tests of

47: cosmology.  Predicting photometric redshift performance has therefore

48: become an important part of planning large optical surveys.  There are

49: two distinct aspects of performance to consider.  First, there are

50: straightforward goals of accuracy and precision.  Second, to control

51: systematic errors in the downstream science, one must be able to {\it

52:   know}, in some cases rather stringently, the accuracy and precision

53: of the photometric redshifts in the actual survey (Ma \etal\ 2006,

54: Huterer \etal\ 2006).  Knowing the actual photometric redshift

55: precision can be more important than maximizing the precision.  For

56: example, cosmic shear tomography calls for relatively wide redshift

57: bins ($dz \sim 0.2$).  Leakage between bins, to the extent that it is

58: known, can be precisely incorporated into comparisons between models

59: and data.  This by itself is not very demanding in terms of

60: photometric redshift precision.  However, in a large survey with very

61: small statistical errors, the leakage must be known very precisely to

62: avoid nontrivial systematic errors.  Ma \etal\ (2006) estimate that

63: for cosmic shear tomography with next-generation surveys, the bias and

64: rms scatter in each redshift bin must be known to $\sim$0.003 to avoid

65: degrading the shot-noise-limited constraints on dark energy.

66:

67: To first order, photometric redshift performance depends on filter

68: set, signal-to-noise (S/N), and the desired range of redshifts and

69: galaxy types.  Here we wish to call attention to an often overlooked

70: aspect: photometry errors.  Photometric redshift simulations and

71: real-life implementations typically assume Gaussian photometry errors.

72: Real data are more complicated. As one anecdote, Cameron \& Driver

73: (2007) note that in one catalog of 42 galaxies with both photometric

74: and spectroscopic redshifts, there were six outliers, all of which

75: had questionable photometry due to saturation, neighbors, or multiple

76: nuclei.  In this paper we show that knowing the true distribution of

77: errors is important for optimizing photometric redshift precision. We

78: also discuss how that in turn affects the size of the spectroscopic

79: sample required to characterize the photometric redshift errors in a

80: survey.

81:

82: \section{Methods}

83:

84: We conduct four sets of simulations built around the following basic

85: setup.  We use the Bayesian Photometric Redshift (BPZ, Benitez 2000)

86: code, which uses a set of template galaxy spectral energy

87: distributions (SEDs) and a set of priors to help break degeneracies in

88: color space.  We chose the six SED templates and the HDFN prior

89: detailed in Benitez (2000).  BPZ is representative of one of two types

90: of methods in the photometric redshift community.  We discuss possible

91: impacts on the other type, training-set methods, in

92: \S\ref{sec-discussion}.  The choice of filter set is not important for

93: this demonstration.  We use the same filter set (F300W, F450W, F606W,

94: F814W, J, H, K) used for the Hubble Deep Field North (HDFN)

95: photometric redshifts discussed in Hogg \etal\ (1998), Benitez (2000),

96: and Fernandez-Soto \etal\ (1999, 2001).

97:

98: Each simulation generates a synthetic catalog of 6000 galaxies evenly

99: spread throughout the F814W magnitude range 20--26.  This and other

100: aspects of the simulations are not realistic, but are adopted to

101: facilitate analysis by covering parameter space evenly.  The results

102: presented here therefore do not apply quantitatively to any real

103: survey, but they demonstrate the issues.  The simulator uses each

104: galaxy's magnitude to choose a random type and redshift following the

105: distributions described by the priors.  It then looks up the synthetic

106: observer-frame colors of that type at that redshift, and adds noise

107: (the character of which varies with the simulation) before saving the

108: catalog.  An unrealistic aspect of the noise in all simulations is

109: that it is a fixed percentage of the model flux.  That is, every

110: galaxy is observed at the same S/N, regardless of magnitude, redshift,

111: or filter.  This is another analysis convenience.  The effect of

112: varying S/N was explored in one specific case by Margoniner \& Wittman

113: (2007), and will have to be customized to each survey.

114:

115: We then run the catalogs through BPZ, with the HDFN prior turned on,

116: and analyze the performance in terms of

117: $\delta z \equiv {z_{\rm phot} - z_{\rm spec} \over 1+ z_{\rm spec}}$,

118: specifically the bias

119: $\bar{\delta z}$ and the scatter $\delta z_{rms}$.

120:

121: \section{Realizations}

122:

123: As baselines, we do two simulations with Gaussian noise: SIM1 with 5\%

124: noise ($S/N=20$) and SIM2 with 10\% noise ($S/N=10$).  These

125: photometry error distributions are shown in Figure~\ref{fig-phot}.

126: The resulting $\delta z$ distributions are shown in

127: Figure~\ref{fig-dz}.  In both cases, the bias is small (0.003 or less

128: in absolute value) and not inconsistent with zero.  The scatter

129: depends strongly on S/N: $\delta z_{rms} = 0.026$ for S/N of 20,

130: increasing to 0.070 for S/N of 10.  We also did a run with $S/N=100$,

131: not shown in the figures: $\delta z_{rms} = 0.004$.  This is extremely

132: tight because the quoted S/N is achieved in {\it each} band for {\it

133: each} galaxy.

134:

135: \begin{figure}

136: \centerline{\resizebox{3in}{!}{\includegraphics{f1.eps}}}

137: \caption{Photometry error distributions, SIM1: 5\% Gaussian (solid

138:   black curve); SIM2: 10\% Gaussian (dotted red curve); SIM3 and SIM4:

139:   5\% Gaussian with exponential tails (dashed blue curve).

140: \label{fig-phot}}

141: \end{figure}

142:

143: \begin{figure}

144: \centerline{\resizebox{3in}{!}{\includegraphics{f2.eps}}}

145: \caption{Distributions of $\delta z$: colors and linetypes are as in

146:   previous figure, with the addition of SIM4 (long-dash magenta

147:   curve), which uses the non-Gaussian noise model in the photometric

148:   redshift estimation.

149: \label{fig-dz}}

150: \end{figure}

151:

152: Next, we add non-Gaussian tails to the photometry error distribution.

153: We adopt a functional form

154: $$ p(\delta f) = {1\over\sigma\sqrt{2\pi}+AB} (\exp(-{(\delta f)^2

155:   \over 2 \sigma^2 }) + A \exp(-{|\delta f| \over B}))$$ where $\delta

156:   f$ is the flux error, $\sigma$ describes the width of the Gaussian

157:   core, and the parameters A and B describe the tails.  For a given

158:   $\sigma$, the fraction of galaxies in the tails is sensitive to

159:   changes in the product $AB$ but relatively insensitive to changes in

160:   A and B as long as the product is held constant.  There is little

161:   published data on realistic values of A and B.  Margoniner \&

162:   Wittman (2007) briefly descibe photometry simulations in which

163:   synthetic galaxies are added to real images from the Deep Lens

164:   Survey (DLS, Wittman \etal\ 2002).  We roughly match the fraction of

165:   objects in that tail, but with two symmetric tails and $\sigma=0.05$

166:   as in SIM1, by setting $A=0.1$ and $B=0.15$ or $3\sigma$.  For this

167:   choice of A and B, used in SIM3 and SIM4 and shown as the blue dash

168:   curve in Fig.~\ref{fig-phot}, the tails begin to dominate over the

169:   Gaussian core at 2.51 times the rms of the Gaussian core, and 9.4\%

170:   of the galaxies are ``in'' the tails, compared to 1.2\% falling

171:   outside 2.51$\sigma$ for a pure Gaussian.  The rms of the

172:   distribution is 0.103, very close to that of SIM2.

173:

174: As a comparison, the photometry error distribution for bright,

175: unresolved objects in the Sloan Digital Sky Survey (SDSS) is published

176: in Fig. 3 of Ivezi{\'c} \etal\ (2003), who state that 0.9\% of objects

177: lie outside of $\pm 3\sigma$ (where $\sigma=0.02$), vs. 0.3\% for a

178: pure Gaussian.  This observation, and the figure, are reasonably

179: approximated by $A=0.1$ and $B=0.0235$ or 1.2$\sigma$.  These tails

180: are much smaller than used in SIM3 and SIM4, which have 7.3\% of their

181: galaxies outside $\pm 3\sigma$.  However, the available SDSS data are

182: for {\it bright ($g<20.5$) point sources}.  Photometry is notably more

183: difficult for extended sources and for faint sources.  In the DLS

184: simulations, $A$ is consistent with zero for bright ($20<R<22$)

185: galaxies, and grows steadily with magnitude.  Of course, most of the

186: galaxies in a deep survey are at the faint end.  Therefore, while

187: noting the near-Gaussianity of the SDSS bright point-source

188: photometry, we believe that heavier tails are currently more

189: appropriate for faint galaxies in deep ground-based surveys.

190:

191: We attribute the Gaussian cores of these distributions to photon

192: statistics, which is the nominal error reported by most photometry

193: packages, and the tails to other effects such as crowding.  This is a

194: reasonable approximation for ground-based data, with many sky photons

195: per pixel and galaxies usually much fainter than sky.  For space-based

196: photometry, crowding is less important, but photon statistics are less

197: Gaussian due to the smaller number of photons.  The tails in this

198: paper are meant to emulate ground-based surveys as described above.

199: We quantify their impact by estimating redshifts in SIM3 using the

200: nominal Gaussian photometry error as input to BPZ.  Averaged over 100

201: realizations, $\bar{\delta z}$ remained small (0.0038), but $\delta

202: z_{rms}$ increased to 0.092.  The distribution is shown in as the blue

203: short-dash histogram in Fig.~\ref{fig-dz}.

204:

205: Clearly, these tails are very harmful.  Adding them to the $S/N=20$

206: distribution more than doubled $\delta z_{rms}$.  In fact, {\it

207: doubling} the Gaussian photometry noise had less impact on $\delta

208: z_{rms}$ than did adding these tails.  Surveys will have to control

209: the tails of their photometry error distributions if they are to reach

210: the photometric redshift performance expected based on their filter

211: set and S/N.  Modern surveys do recognize this and work to reduce the

212: tails, but tails will always be present at some level.  Legacy surveys

213: may have non-Gaussian errors frozen into their data, and new surveys

214: will find it expensive to eliminate all non-Gaussian sources of error.

215: Therefore, we investigate the extent to which knowledge of these

216: errors can render them less damaging to photometric redshifts.

217:

218: \section{Living with Non-Gaussian Errors}

219:

220: Accounting for these errors is straightforward.  In the BPZ code, the

221: probability of observing colors $C$ given a model SED type $T$ and

222: redshift $z$, $p(C|T,z)$ is simply a Gaussian of width set by the

223: nominal photometry errors for that galaxy.  In SIM4, we use the same

224: input photometry as SIM3 but replace that noise model with the full

225: heavy-tailed distribution used the generate the catalog.  The

226: resulting $\delta z$ distribution is shown in Fig.~\ref{fig-dz} as the

227: long-dash magenta histogram.  The outliers in $\delta z$ which

228: appeared in SIM3 have now largely disappeared, and $\delta z_{rms}$ is

229: down to 0.072.  This is comparable to $\delta z_{rms}$ in SIM2, which

230: had twice the simulated sky noise, but no tails.

231:

232: The scatter in $\delta z$ increases to 0.082 if one uses the

233: unmodified BPZ code assuming Gaussian errors, but with an rms of 0.1

234: instead of 0.05, to roughly approximate the wider distribution of

235: photometry errors. As another comparison case for incorrect noise

236: models, we estimated redshifts from a SIM2 realization using the SIM1

237: noise model.  In this case, $\delta z_{rms}$ changed by only 0.003,

238: which was not quite significant given the sample size.  Thus, it

239: appears that if the photometry errors are Gaussian, knowing the width

240: of that Gaussian is not very critical.  We see from Fig.~\ref{fig-dz}

241: that it is the 1 in $\sim$500 outlier that is responsible for the poor

242: performance of SIM3.  SIM2 lacks extreme outliers, so qualitatively,

243: its better performance makes sense despite its broader core.  Yet this

244: degree of insensitivity to the Gaussian width is somewhat surprising.

245:

246: For comparison, we perform a version of SIM4 in which the tails are

247: much less prominent, as in the SDSS bright point-source photometry:

248: $\sigma=0.05$, $A=0.1$, and $B=0.06$ (1.2$\sigma$).  We find that

249: $\delta z_{rms}=0.031$, with the noise model affecting only the fourth

250: decimal place.  The photometry tails are apparently small enough that

251: including them in the noise model is not very helpful, but overall

252: performance is still significantly worse than with no tails at

253: all. (SIM1 had $\delta z_{rms}=0.026$, while the variation from

254: realization to realization is $\sim0.001$ and these numbers are quoted

255: after averaging over 100 realizations.)  This indicates that even

256: small photometry tails can have a significant impact on photometric

257: redshift performance.

258:

259: \section{Discussion}

260: \label{sec-discussion}

261:

262: It is not surprising that tails in the photometry error distribution

263: can cause outliers in the $\delta z$ distribution.  However, a number

264: of points are worth remarking:

265: \begin{itemize}

266:

267: \item Adding heavy tails (comprising $\leq 10\%$ of the galaxies)

268: caused more increase in $\delta z_{rms}$ than did {\it doubling} the

269: Gaussian photometry error.  In other words, the photometric redshift

270: performance of a survey with large tails could be worse than that of a

271: survey with {\it half} the S/N but with no tails.  Surveys should

272: therefore pay close attention to reducing the tails of the color

273: errors.  This is not the same as reducing the tails of the flux

274: errors.  As an extreme example, if an equal fraction of light is lost

275: in all filters, the colors are unaffected.

276:

277: \item Assuming that non-Gaussian errors can never be entirely

278:   eliminated, the effect of the tails on photometric redshift

279:   performance can be mitigated by including an accurate noise model in

280:   the photometric redshift process.  This will in

281:   turn require extensive Monte Carlo simulations which include all

282:   important sources of non-Gaussian errors, such as crowding and

283:   complex galaxy morphology.  In addition, the importance of the tails

284:   is likely to vary with magnitude, seeing, etc.

285:

286: \item No clear rule is evident for required accuracy of the noise

287: model.  Photometric redshift precision was not significantly affected

288: when errors and model were both Gaussian but the rms was wrong by a

289: factor of two.  When errors were heavy-tailed, approximating them with a

290: Gaussian of the same rms won back about half of the precision that

291: could be won back with the fully correct noise model.

292:

293: \item Even very small tails have a measurable impact on $\delta

294: z_{rms}$, but in this case the noise model made no measurable

295: difference.

296:

297: \end{itemize}

298:

299: The tails also have a disproportionate impact on the problem of

300: knowing $\delta z_{rms}$ precisely for each redshift bin, whereas

301: precision on $\bar{\delta z}$ did not suffer substantially.  If the

302: $\delta z$ distribution is Gaussian, the spectroscopic sample size

303: required to calibrate $\delta z_{rms}$ to a desired accuracy

304: $\sigma_{cal}$ is $\sim {(\delta z_{rms})^2 \over 2 \sigma^2}$ (this

305: of course assumes that the spectroscopic sample is representative of

306: the photometric sample).  For $\sigma_{cal}=0.003$ and a class of

307: sources with $\delta z_{rms} = 0.026$ as in SIM1, only $\sim 40$

308: galaxies would be required.  However, bootstrap resampling of SIM1

309: shows that seven times more galaxies are required to know $\delta

310: z_{rms}$ to the same accuracy, due to its non-Gaussian tails (which

311: stem from the properties of galaxies in color space, not from the

312: photometry).  For SIM2, the factor is thirteen, presumably because the

313: greater noise in SIM2, although still Gaussian, allows more

314: near-degeneracies in color space to come into play.  For SIM3 with its

315: heavy photometry tails, the factor is $\sim$50.  However, this can be

316: much reduced simply by incorporating the correct noise model into the

317: photometric redshift estimation.  SIM4 requires ``only'' $\sim 25$

318: times as many galaxies as the Gaussian prediction would suggest, and

319: the Gaussian prediction is itself $\sim 2$ times smaller than for

320: SIM4, because of the smaller $\delta z_{rms}$.  Of course, it would be

321: preferable to reduce non-Gaussian tails in the underlying photometry

322: as much as possible, as dramatically illustrated by the large

323: remaining differences between SIM4 and either SIM1 or the simulation

324: with SDSS-like tails.

325:

326: We caution that this procedure may substantially overestimate

327: spectroscopic sample requirements.  They are based on the Gaussian

328: model of photometric redshift errors employed by Ma \etal\ (2006), who

329: derived a prescription for precision of our knowledge of $\delta

330: z_{rms}$.  But the rms of a distribution is driven by its tails, so

331: that the tails seem to be all-important here.  If the photometric

332: redshift error model used in the cosmological parameter estimation

333: were modeled differently, the tails could assume a more proportional

334: influence, and fewer spectroscopic redshifts would be required to

335: characterize their effect.  Mandelbaum \etal\ (2007) discuss some

336: related aspects in the context of galaxy-galaxy lensing.

337:

338: The applicability of this work to training-set methods depends on the

339: details of the method.  An advantage of training set methods is that

340: they may ``learn'' the correct noise model automatically, and

341: therefore should not require any modification to reach optimum

342: performance (which is presumably still much reduced compared to the

343: no-tails case).  But for this to happen, the training set must be

344: sufficiently large to encompass the non-Gaussian features of the

345: photometry.  This may require a rather larger training set than would

346: otherwise be required, and it also requires a training set that is not

347: cleaner than the full dataset.  However, it may be possible to build a

348: hybrid approach in which detailed knowledge of photometry error

349: distributions from large sets of Monte Carlos is combined with a

350: modest spectroscopic sample to train the algorithm.

351:

352: Non-Gaussian photometry errors may not be a substantial source of

353: catastrophic outliers in current surveys.  The SIM3/SIM4 tails may be

354: unrealistically heavy, as there is scant published data on the size of

355: the non-Gaussian tails for faint galaxy photometry.  Furthermore,

356: catastrophic outliers exist even with purely Gaussian photometry

357: errors, due to color-space degeneracies.  However, real-world

358: experience such as that of Cameron \& Driver (2007) and, in a

359: different context, Bolton \etal\ (2004), suggests that non-Gaussian

360: errors are often not negligible.  Color-space degeneracies are usually

361: {\it near}-degeneracies, and galaxies become much more likely to

362: scatter across a near-degeneracy if the the photometry has

363: non-Gaussian tails.

364:

365: Our example started from an unrealistically good baseline of $S/N =

366: 20$ in each of seven filters and $\delta z_{rms}=0.026$, so the effect

367: of the tails was particularly dramatic.  Surveys starting from a more

368: realistic performance baseline will not see such a large fractional

369: increase in scatter, but may still see the effect of tails in the

370: overall error budget.  Limiting the tails of the photometry error

371: distribution and using an accurate error model will reduce photometric

372: redshift scatter and greatly reduce the size of the spectroscopic

373: sample required to calibrate the scatter.

374:

375: \begin{thebibliography}{}

376:

377: \bibitem[Benitez 2000]{} Benitez, N. 2000, \apj, 536, 571

378:

379: \bibitem[Bolton et al.(2004)]{2004AJ....127.1860B} Bolton, A.~S., Burles,

380: S., Schlegel, D.~J., Eisenstein, D.~J., \& Brinkmann, J.\ 2004, \aj, 127,

381: 1860

382:

383: \bibitem[Connolly et al.(1995)]{1995AJ....110.2655C} Connolly, A.~J.,

384: Csabai, I., Szalay, A.~S., Koo, D.~C., Kron, R.~G., \& Munn, J.~A.\ 1995,

385: \aj, 110, 2655

386:

387: \bibitem[Fern{\'a}ndez-Soto et al.(1999)]{1999ApJ...513...34F}

388: Fern{\'a}ndez-Soto, A., Lanzetta, K.~M., \& Yahil, A.\ 1999, \apj, 513, 34

389:

390: % intro to template noise

391: \bibitem[Fern{\'a}ndez-Soto et al.(2001)]{2001ApJS..135...41F}

392: Fern{\'a}ndez-Soto, A., Lanzetta, K.~M., Chen, H.-W., Pascarelle, S.~M., \&

393: Yahata, N.\ 2001, \apjs, 135, 41

394:

395: \bibitem[Hogg et al.(1998)]{1998AJ....115.1418H} Hogg, D.~W., et al.\ 1998,

396: \aj, 115, 1418

397:

398: \bibitem[Huterer et al.(2006)]{2006MNRAS.366..101H} Huterer, D., Takada,

399: M., Bernstein, G., \& Jain, B.\ 2006, \mnras, 366, 101

400:

401: \bibitem[Ivezi{\'c} et al.(2003)]{2003MmSAI..74..978I} Ivezi{\'c}, {\v Z}.,

402: et al.\ 2003, Memorie della Societa Astronomica Italiana, 74, 978

403:

404: \bibitem[Ma et al.(2006)]{2006ApJ...636...21M} Ma, Z., Hu, W., \& Huterer,

405: D.\ 2006, \apj, 636, 21

406:

407: \bibitem[Mandelbaum et al.(2007)]{2007arXiv0709.1692M} Mandelbaum, R., et

408: al.\ 2007, ArXiv e-prints, 709, arXiv:0709.1692

409:

410: \bibitem[Margoniner \& Wittman(2007)]{2007arXiv0707.2403M} Margoniner,

411: V.~E., \& Wittman, D.~M.\ 2007, ArXiv e-prints, 707, arXiv:0707.2403

412:

413: \end{thebibliography}

414: \end{document}

415:

416:

417: