0805:0805.2325/ms.tex

1:

2: \documentclass[12pt,preprint]{aastex}

3: %\usepackage{amsfonts}

4: %\usepackage{amssymb}

5: %\usepackage{epsfig}

6: %\usepackage[round]{natbib}

7:

8: %\setlength{\topmargin}{5bp}

9: %\setlength{\topskip}{0in}

10: %\setlength{\headsep}{0pt}

11: %\setlength{\headheight}{0pt}

12: %\setlength{\textwidth}{6.4in}

13: %\setlength{\textheight}{8in}

14: %\setlength{\footskip}{0.25in}

15: %\setlength{\oddsidemargin}{5bp}

16: %\setlength{\evensidemargin}{5bp}

17: %\renewcommand{\baselinestretch}{1.66}

18: %\setlength{\emergencystretch}{2em} % Add a little slop

19: %\DeclareMathSizes{12}{12}{10}{10} % Make large super/subscripts

20: %\setlength{\footnotesep}{0.6cm}

21:

22:

23: %\renewcommand{\baselinestretch}{2}

24: %\tolerance=500

25:

26:

27:  \begin{document}

28: %

29:

30: %\bibliographystyle{apj}

31: %\bibliographystyle{elsart-harv}

32:

33: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

34:

35: \title{A valid and fast spatial bootstrap for correlation functions}

36: \shorttitle{Spatial bootstrap for correlation functions}

37: \author{Ji Meng Loh}

38:

39: \affil{Department of Statistics, Columbia University, New York, 10027,

40: USA}

41: \email{meng@stat.columbia.edu}

42:

43:

44: \begin{abstract}

45:

46: In this paper, we examine the validity of non-parametric spatial

47: bootstrap as a procedure to

48: quantify errors in estimates of $N$-point correlation functions.

49: We do this by means of a small simulation study with simple point

50: process models and estimating the two-point correlation

51: functions and their errors. The coverage of confidence intervals

52: obtained using bootstrap is compared with those obtained from assuming

53: Poisson errors. The bootstrap procedure considered here is adapted for

54: use with spatial (i.e.\ dependent) data. In particular, we describe a

55: marked point bootstrap where, instead of resampling points or blocks

56: of points, we resample marks assigned to the data points. These marks

57: are numerical values that are based on the statistic of interest.

58: We describe how the marks are defined for the two- and three-point

59: correlation functions. By

60: resampling marks, the bootstrap samples retain more of the dependence

61: structure present in the data. Furthermore, this method of bootstrap

62: can be performed much quicker than some other bootstrap methods for

63: spatial data, making it a more practical method with large datasets.

64: We find that with clustered point datasets, confidence intervals

65: obtained using the marked point bootstrap has empirical coverage

66: closer to the nominal level than the confidence intervals obtained

67: using Poisson errors. The bootstrap errors were also found to be

68: closer to the true errors for the clustered point datasets.

69:

70:

71:

72: \end{abstract}

73:

74: \keywords{methods:statistical}

75:

76: \section{Introduction} \label{sect:intro}

77:

78: In analyses of survey data, such as those of galaxies or quasars,

79: $N$-point correlation functions are often

80: estimated \citep[e.g.][]{kulkarni07, mccracken07, shen07}. These help to describe the structure of the observed

81: objects, such as the filamentary structure of galaxies that

82: has been observed, and to constrain the parameters of cosmological

83: models. Such estimates are only as important as their associated errors,

84: since it is the errors that indicate

85: the amount of agreement between two sets of data or between data and

86: simulations from a model.

87:

88: In spatial point processes, the expressions for the standard errors of

89: correlation

90: (and similar) functions have been worked out only for the simplest of

91: models. For example, \citet{ripley88} found approximations for the

92: variance of the $K$ function, an integral of the two-point correlation

93: function, for the Poisson process. These depend on such factors as the

94: shape of the observation region and the type of correction method for

95: boundary effects. \citet{landy93}, \citet{hamilton93} and others

96: worked out approximations for standard errors of various estimators of

97: the two-point correlation function under the Poisson or weakly

98: clustered models. Thus, very often, approximations such as Poisson

99: errors are used instead. However,

100: point data arising in astronomy are typically clustered and

101: non-Poisson. So while Poisson errors are useful and easy to compute,

102: they only serve as rough indications of the size of errors.

103:

104: Besides using Poisson errors, errors can also be estimated by using

105: mock catalogs generated from a cosmological model. This method was

106: employed in \citet{eisenstein05}, where the initial conditions for the

107: cosmological model were selected independently. In the statistics

108: literature, this is referred to as parametric bootstrap, although the term

109: more commonly refers to mock datasets generated from the model using

110: parameter values fixed at the estimates from the data, instead of

111: being independently chosen.

112:

113: An alternative is to use non-parametric bootstrap. This

114: involves generating new samples, called bootstrap samples, by

115: resampling from the actual data, and computing estimates for these new

116: samples. The distribution of these bootstrap estimates then serves as

117: a proxy for the actual distribution of the data estimates, so that

118: statistical inference, such as the construction of confidence

119: intervals, can be performed. Note that the procedure does not make any

120: specific model assumptions, thus the errors obtained by this method

121: can serve as a check of model assumptions.

122: Due to the simplicity and flexibility of the non-parametric bootstrap,

123: the method is attractive. What is desirable then is to make the

124: non-parametric bootstrap procedure work as well as possible for data

125: that is correlated, and check that it performs satisfactorily, so that

126: it can be useful as a tool in analysis.

127:

128: This paper thus examines the non-parametric bootstrap, specifically

129: bootstrap of spatial data, where the dependence present in the data is

130: of interest. There were some early misconceptions about how bootstrap

131: should be applied to spatial data. The naive method of resampling

132: individual points does not work in the spatial context.

133: In order for spatial bootstrap to be valid, the underlying

134: dependence structure has to be preserved as much as possible when

135: generating bootstrap samples.  Two common methods for doing this are the block

136: bootstrap and subsampling where blocks of data, instead of individual

137: data points, are resampled. We introduce these  methods

138: in Section \ref{sect:bootspatial} and describe their shortcomings.

139: In Section \ref{sect:improve} we describe the marked point bootstrap

140: \citep{loh02a}

141: as a way to address these shortcomings. We describe how the marked

142: point bootstrap can be used with the two- and three-point correlation

143: function estimators, and by extension to estimators of $N$-point

144: correlation functions.

145: In Section \ref{sect:simstudy} we present results of a simulation

146: study using simple point process models comparing the empirical

147: coverage of confidence intervals obtained using non-parametric

148: bootstrap and using normal approximations with Poisson errors.

149:

150: In this paper, we restrict ourselves to constructing nominal 95\%

151: confidence intervals, i.e. these confidence intervals are supposed to

152: contain the true value 95\% of the time. The empirical coverage of the

153: confidence intervals is the actual confidence level achieved by the

154: confidence intervals. In a simulation study with a known model, the

155: empirical coverage can be obtained by finding the number of confidence

156: intervals that contain the true value and then compared with the nominal

157: level. It is desirable, of course, for the empirical coverage to be

158: close to the nominal level. Furthermore, it is often better for the

159: empirical coverage to be higher instead of lower than the nominal

160: level, so that the procedure is conservative.

161:

162: Bootstrap is a computationally intensive procedure. With the large

163: datasets now common in astronomy, even computing the $N$-point

164: correlation functions pose computational challenges. For example,

165: \citet{eisenstein05} avoided using the jackknife procedure for error

166: estimation because of the size of the data they used.

167: The way the

168: marked point bootstrap is formulated, however, makes it much faster than

169: subsampling (a generalization of the jackknife) and the

170: block bootstrap, so that applying the procedure to large datasets is

171: feasible as long as computing the actual estimates is

172: feasible. In Section \ref{sect:simstudy} we provide some

173: time measurements of the procedure used in our simulation study.

174:

175:

176: \section{Non-parametric bootstrap for spatial data}

177: \label{sect:bootspatial}

178:

179: The non-parametric bootstrap was originally developed for independent data

180: \citep{efron94}. The main idea

181: is to draw new samples from the actual data by sampling with

182: replacement a data point at a time. Bootstrap estimates of the same

183: statistic are computed from the bootstrap samples. With these

184: bootstrap estimates, confidence intervals, for example, can then be

185: constructed. This can be done in a variety of ways. Suppose $K$, $\hat{K}$ and

186: $\hat{K}^*_i, i=1, \ldots B$ are respectively the quantity of

187: interest, the estimate of $K$ computed from the data and the bootstrap

188: estimates, with $B$ equal to the number of bootstrap samples.

189: A simple method, called the basic

190: bootstrap interval in \citet{davison97}, is to set

191: \begin{eqnarray}

192: & [2\hat{K}-\hat{K}^*_{(B+1)(1-\alpha/2)},

193: 2\hat{K}-\hat{K}^*_{(B+1)\alpha/2} ] & \label{eqn:basicCI}

194: \end{eqnarray}

195: as the $100(1-\alpha)$\% confidence interval for $K$. Here, $B$, the number

196: of bootstrap samples, is large, say, 999, and $\hat{K}^*_u$ is

197: the $u$-th ordered values of the bootstrap statistic. So, for example, with

198: $B=999$ bootstrap samples, a 95\% confidence interval for $K$ is given

199: by $[2\hat{K}-\hat{K}^*_{975},

200: 2\hat{K}-\hat{K}^*_{25} ]$. In our simulation studies, we use

201: (\ref{eqn:basicCI}) to construct the confidence intervals. Standard

202: errors for $\hat{K}$ are estimated by the standard deviation of the

203: bootstrap estimates $\hat{K}^*$.

204:

205: While there are other methods of constructing confidence intervals

206: from bootstrap samples \citep[see][for example]{davison97}, the interest here is in the method of

207: generating the bootstrap samples, when the data is

208: spatial. \citet{snethlage99} rightly concludes that resampling

209: individual points do not work. If the resampled points are placed in

210: their original positions in the observation region, there will be

211: multiple points at single locations, which do not usually occur in most

212: data sets.  In their claim that bootstrap cannot be used for

213: analysis for clustering, \citet{simpson86} were also considering

214: bootstrap in terms of resampling individual points.

215:

216: Due to the success of bootstrap for resampling independent data, it

217: has been extended to resample dependent data. Most of this work is for

218: time series, but can easily be applied to spatial data in two and

219: three dimensions. A common method is the block bootstrap: blocks of

220: the spatial data are sampled at random, then joined together to form a new

221: sample \citep{hall85, kunsch89, liu92}. Asymptotic arguments for the

222: validity of the bootstrap involve

223: limiting the range of dependence, increasing the observation region size

224: and letting the resampling block size increase but at a slower rate

225: than the observation region size. In this asymptotic setting, we then

226: have many almost independent blocks of data, with each block itself

227: containing a large subsample \citep[see e.g.][]{lahiri03}.

228: However, the assumed conditions necessary to make the

229: calculations tractable also means that normal approximations work well too.

230: Some theoretical results show that the accuracy of bootstrap

231: estimates is of a higher order than the normal approximations. Whether

232: this difference is meaningful in actual practice is less clear.

233: We believe that the role of non-parametric bootstrap is to serve as another

234: objective method to obtain standard errors that do not make any model

235: assumptions. Error estimates obtained using bootstrap can be used as a

236: way to assess or compare with other estimates of

237: errors.

238:

239:

240: \citet{kemball05} is a recent work in the astronomy literature

241: that examined bootstrap for dependent data. For the non-parametric

242: bootstrap, they focused on

243: subsampling \citep{politis93a, politis99}, which can be considered

244: a generalization of the jackknife procedure. In subsampling,

245: random portions of the data are deleted, and the remaining data are

246: treated as bootstrap samples. The standard deviation of the estimates computed

247: from these samples serves as an estimate of the error, less a factor

248: to adjust for the smaller subsamples and the large overlap.

249:

250: When estimating correlation functions, pairs or triplets etc of points

251: have to be counted. By joining independently resampled blocks together to

252: form the bootstrap sample, the block bootstrap creates artifical

253: configurations of points across the resampling blocks and distorts the

254: dependence structure in the data. This does not matter in asymptotic

255: arguments because the effect becomes negligible if the range of the

256: correlation is fixed while the resampling blocks increase in

257: size. However, \citet{loh02a} found that the

258: actual coverage achieved by confidence intervals obtained using block

259: bootstrap can be much lower than the nominal percentage level for finite samples.

260:

261: In subsampling, no artificial configurations of points are

262: created. However, while the correction weight accounts for the

263: difference in sample sizes between the bootstrap samples and the

264: actual data set, it does not account for the change in the boundary

265: effects due to the different resampling regions. Since subsampling

266: uses smaller regions as the bootstrap observation regions, boundary

267: effects are magnified.

268: For subsampling, there is the temptation to use large

269: subsamples to try and retain more of the dependence structure, but

270: like block bootstrap, theoretical justification of the method requires

271: that the subsamples be small in size relative to the actual data set.

272: \citet{loh02a} also found that subsampling can

273: yield confidence intervals that attain very low empirical

274: coverage. They also found that the subsampling method is sensitive to

275: the fraction of the data used for subsampling.

276:

277: \citet{loh02a}

278: proposed another version of spatial bootstrap, called marked point

279: bootstrap, that reduces the effect of joining independent blocks and

280: produces confidence intervals that achieve coverage closer to the

281: nominal level. This is described in the next section, where we also

282: show how it can be applied to the two- and three-point correlation

283: function estimators commonly used in astronomy.

284:

285:

286:

287: \section{Improving the non-parametric bootstrap of spatial data}

288: \label{sect:improve}

289:

290: Suppose $N$ points are observed in a region $A$. Furthermore, suppose

291: that the quantity of interest $K$ can be estimated using an estimator

292: of the form

293: \begin{eqnarray}

294: \hat{K} & = & \frac{1}{N}\sum_{i=1}^N\sum_{j=1 \atop j\ne i}^N

295: \phi(x_i, x_j) \equiv \hat{\Phi}/N.

296: \label{eqn:estimator}

297: \end{eqnarray}

298: Note that each point $i$ has an associated quantity $\sum_{j=1, j\ne i}^N

299: \phi(x_i,x_j)$, the inner sum of equation (\ref{eqn:estimator}).

300: Estimators of two-point statistics can be expressed in this form. In

301: this case, the

302: quantity $\phi(x_i,x_j)$ will depend on the distance

303: between $x_i$ and $x_j$. As an

304: aside, note that estimators of three-point statistics

305: can be written in a similar form, with the inner sum replaced by a

306: double sum.

307:

308: With point data, the term ``mark'' is used to refer to some additional

309: information associated with a point. This is usually some actual

310: measured value. For galaxy data, for example, marks could be

311: quantities such as luminosity, color and so on. In this paper, the

312: bootstrap method considered uses marks associated with the

313: points. However, these marks are not quantities such as luminosity

314: that are directly measured. Instead they are numerical quantities that

315: we construct and associate with the points. The actual values of these

316: marks are not random, but are constructed so that they relate to the

317: statistic that is of interest. If the statistic of interest is given

318: by equation (\ref{eqn:estimator}), then the mark associated with point

319: $i$, denoted by $m_i$, is equal to $\sum_{j=1, j\ne i}^N

320: \phi(x_i,x_j)$, so that\ $\hat{\Phi} = \sum_i m_i$. At the risk of

321: being repetitive, suppose that

322: $\hat{\Phi} \equiv DD(r) = \sum_{x\in D} \sum_{y\in D:y\ne x} 1\{ |x-y| \in

323: (r-dr,r+dr)\}$, for some $r$, is of

324: interest. This quantity is used in estimators of the

325: two-point correlation function, and is the number of pairs of points

326: separated by (roughly) distance $r$. Then the mark associated with

327: point $x$ is $\sum_{y\in D:y\ne x} 1\{ |x-y| \in (r-dr,r+dr)\}$, the

328: number of points that are roughly distance $r$ away from $x$. Note

329: that the sum of all the marks gives back the value of $DD(r)$. It is

330: also important to note that to compute the estimate

331: (\ref{eqn:estimator}), the marks have to be calculated anyway.

332: In regular applications, the algorithm doing the estimation does not

333: individually record these marks, but keeps a running sum of the marks.

334: In order to do the marked point bootstrap, the

335: difference in terms of the code is that the marks now have to be

336: stored so that they can be used in the bootstrap step.

337:

338: In the block bootstrap, blocks of data

339: points are resampled and then joined together, forming a new dataset

340: from which $K$ is estimated using the new configuration of points that

341: was generated, yielding $\hat{K}^*$. In the marked point bootstrap,

342: blocks can be used to resample points as well. However, the crucial

343: difference is that the bootstrap estimate is computed, not from how

344: the resampled points are positioned, but from the marks that are associated

345: with these points. In other words, the marked point bootstrap

346: resamples the marks rather than the points and the bootstrap estimate is

347: computed by summing these resampled marks.

348:

349: To be more precise, suppose that $N^*$ number of points have been resampled,

350: with the resampled points denoted by $x_j^*, j=1, \ldots

351: N^*$. Associated with each $x_j^*$ is a mark $m_{j^*}$. We denote this

352: mark  $m_{j^*}$ rather than  $m_j^*$ to emphasize the fact that these

353: marks are sampled from the actual data, i.e.\ computed from the

354: original dataset and not from the bootstrap sample. Then the bootstrap

355: estimate of $K$ is given by the average of the resampled marks: $$\hat{K}^*

356: = \hat{\Phi}^*/N^* =

357: \sum_{j=1}^{N^*} m_{j^*}/N^*,$$ just like $\hat{K}$ is given by the average of

358: the actual marks. Note that in an actual implementation of the

359: procedure, all that is required is keeping track of how many times each

360: point is resampled.

361: The step-by-step procedure for estimating and resampling the

362: quantity (\ref{eqn:estimator}) is as follows:

363:

364: \begin{enumerate}

365: \item For each point $i$, calculate $m_i = \sum_{j=1, j\ne i}^N

366:   \phi(x_i,x_j)$.

367: \item Obtain the estimate $\hat{K}$, using $\hat{K}= \sum_i m_i/N$.

368: \item Resample the points. This can be done by randomly placing blocks

369:   on to the observation region and keeping track of which point is

370:   resampled. Suppose point $i, i=1,\ldots , N$ is resampled $n^*_i$ times, and

371:   $N^*=\sum_i n_i^*$.

372: \item The bootstrap estimate is then $\hat{K}^* = \sum_i (n^*_i \times

373:   m_i)/N^*$

374: \item Repeat steps 3 and 4 to get $B$ bootstrap estimates.

375: \item Construct a confidence interval using (\ref{eqn:basicCI}).

376: \end{enumerate}

377:

378: A few remarks about the procedure are in order.

379:

380: \noindent {\bf Remark 1} Instead of randomly placing blocks, the observation region can

381:   be divided into a number of subregions, and the regions selected

382:   randomly with replacement. This latter method is sometimes referred

383:   to as using fixed blocks as opposed to moving blocks. It

384:   is generally considered that the moving blocks bootstrap works

385:   better in terms of convergence rates in asymptotic arguments.

386:

387: \noindent {\bf Remark 2} The number of blocks used is so that the total area/volume of

388:   the blocks is equal to the original area/volume of the observation

389:   region. Note that in this case $N^*$ would usually not be equal to

390:   $N$, though they will be of the same order of magnitude. However,

391:   this does not pose problems since the statistic $\hat{K}$ is a mean

392:   of the marks.

393:

394: \noindent {\bf Remark 3} There is no real consenus on the size of the resampling blocks

395:   to use. \citet{buhlmann99} did some work on determining the optimal block size

396:   from data. Intuitively, the procedure needs large blocks so that

397:   the correlation structure is less distorted, and a large enough

398:   number of blocks so that there is enough variability between

399:   bootstrap samples. If $K$ represents the number of blocks and $N$

400:   the number of data points (which is assumed to increase with the

401:   observation region size), theoretical work in e.g.\ \citet{lahiri03}

402:   suggests that consistency is

403:   achieved as $K\to \infty$ and $N/K \to \infty$. Thus some trade-off

404:   is needed. A rule-of-thumb is to divide each dimension

405:   of the observation region into at least three parts, i.e.\ nine

406:   blocks in 2D, 27 blocks in 3D. This would ensure enough variability

407:   between bootstrap samples. Of course, for correlation functions, the

408:   maximum value of the separation distance $r$ at which these

409:   functions are estimated would influence the

410:   decision on block size.

411: Fortunately, \citet{loh02a} found that the marked point bootstrap is less

412:   sensitive to block size than block bootstrap or subsampling: they

413:   resampled an absorber catalog using slices of the sphere and found that

414:   the bootstrap errors were similar for different sizes of the

415:   slices. Our simulation results also show little difference due to

416:   different block size.

417:

418:

419:

420: There are a few advantages to this form of spatial bootstrap over the

421: regular block bootstrap. Since the bootstrap estimates are based on

422: the resampled marks and not on marks recalculated from the bootstrap

423: sample, the contribution to the bootstrap estimate is due to actual

424: pairs of points in the original dataset. This helps to minimize the

425: distortion of dependence structure in the dataset due to

426: resampling.

427:

428: Furthermore, for any block of resampled points,

429: information about the points just outside the block (and therefore not

430: sampled by this particular block) is captured by the marks associated

431: with the points that are sampled by the block. This helps to reduce

432: the variability of bootstrap results due to the size of the

433: resampling blocks, compared to block bootstrap or subsampling.

434: Also, since the resampling blocks do not need to be

435: joined together to form a contiguous region for the bootstrap sample,

436: there is flexibility in the choice of the shape of the resampling regions.

437:

438: Lastly, the marked point bootstrap can be performed relatively quickly

439: compared to block bootstrap or subsampling. The marks that are

440: associated with the points are

441: part of the actual estimator and are already computed in the

442: estimation step. Resampling using the marked point bootstrap only

443: involves identifying which points are resampled with each resampling

444: region, and keeping track of how many times each point is

445: resampled. Inter-point distances and edge correction weights do not

446: have to be recalculated. With $N$ data points and $B$ bootstrap

447: samples, block bootstrap will take roughly $BN^2$ computations for a

448: statistic involving pairs of points. The marked point bootstrap will

449: involve roughly $N^2 + BN$ computations. The difference will be more

450: marked for three-point computations.

451:

452: Simulation studies done in \citet{loh02a} showed that the empirical

453: coverages of confidence intervals obtained using the marked point

454: bootstrap can be much closer to the nominal 95\% level than those

455: obtained with block bootstrap or subsampling.

456:

457: We now describe how the marked point bootstrap can be used with

458: estimators of the two-point correlation function. The common

459: estimators of the two-point correlation function $\xi(r)$ are

460: \begin{eqnarray}

461: \hat{\xi}_{Nat}(r) & = & \frac{dd(r)}{rr(r)}-1, \label{eqn:nat} \\

462: \hat{\xi}_{DP}(r) & = & \frac{dd(r)}{dr(r)}-1, \label{eqn:dp} \\

463: \hat{\xi}_{Ham}(r) & = & \frac{dd(r)\cdot rr(r)}{dr(r)^2} -1, \label{eqn:ham}\\

464: \hat{\xi}_{Landy}(r) & = & \frac{dd(r)-2dr(r)}{rr(r)}+1, \label{eqn:landy} \\

465: \hat{\xi}_{Hewett}(r) & = & \frac{dd(r)-dr(r)}{rr(r)}, \label{eqn:hewett}

466: \end{eqnarray}

467: which are, respectively, the natural estimator \citep{kerscher2000}, and

468: estimators due to \citet{davis83, hamilton93, landy93, hewett82},

469: where $r$ is

470: some distance of interest. In these expressions, $dd(r) = DD(r)/N^2, dr(r) =

471: DR(r)/NN_R$ and $rr(r)=RR(r)/N_R^2$, where $DD(r)=\sum_{x\in D} \sum_{y\in D: y\ne

472:   x} 1\{ |x-y| \in (r-dr, r+dr)\}/N^2$,

473: $DR(r)=\sum_{x\in D} \sum_{y\in R} 1\{ |x-y| \in (r-dr, r+dr)\}/NN_R$

474: and $RR(r)=\sum_{x\in R} \sum_{y\in R: y\ne x} 1\{ |x-y| \in (r-dr, r+dr)\}/N_R^2$,

475: $R$ is a set of randomly generated points

476: (i.e.\ Poisson) in the observation region $A$, and $N$ and $N_R$ are

477: respectively the number of points in the real and random data sets.

478:

479: To apply the marked point bootstrap, assign to

480: each point $x$ of the dataset marks $m_{x,1}=\sum_{y\in D: y\ne

481:   x} 1\{ |x-y| \in (r-dr, r+dr)\}$ and $m_{x,2}=\sum_{y\in R} 1\{

482: |x-y| \in (r-dr, r+dr)\}$. Bootstrap proceeds by resampling blocks of

483: points and recording the marks associated with them. For a bootstrap

484: sample, $x^*_j, j=1, \ldots N^*$, we then have

485: $$DD^*(r) = \sum_{j=1}^{N^*} m_{x^*_j,1}, \qquad

486: DR^*(r) = \sum_{j=1}^{N^*} m_{x^*_j,2},$$

487: and bootstrap estimates of the two-point correlation functions are

488: then obtained by substituting the above into

489: (\ref{eqn:nat})-(\ref{eqn:hewett}).

490: If each point $x_i$ of the actual data is resampled $n_i^*$ times, so

491: that $N^* = \sum_i n_i^*$, $DD^*(r)$ and $DR^*(r)$ can also be written

492: as

493: $$DD^*(r) = \sum_{i=1}^N (n_i^* \times m_{x_i,1}), \qquad DR^*(r) =

494: \sum_{i=1}^N (n_i^* \times m_{x_i, 2}).$$

495:

496: Note that $RR$ does not need to be resampled, since it is used as an

497: approximation to an integral and has nothing to do with the actual

498: data. If, as is usually the case, estimation of $\xi(r)$ is needed for a

499: range of values of $r$, then the marks $m_{x,1}$ and $m_{x,2}$ would

500: be vectors, containing the relevant values for each value of $r$.

501:

502: Estimators of the three-point correlation function can be bootstrapped

503: in a similar way. For example, an estimator of the three-point

504: correlation function is

505: \begin{eqnarray}

506: \zeta & = & \frac{ddd-ddr}{rrr} + 2, \label{eqn:3ptest}

507: \end{eqnarray}

508: introduced by \citet{peebles75}, where $ddd = DDD/N^3, ddr =

509: DDR/N^2N_R$ and $rrr = RRR/N_R^3$ and $DDD, DDR, RRR$ are counts

510: of triplets of points with the desired configuration, $DDD$ with all

511: points from the real data set and so on. The contribution to $DDD$

512: by any particular triplet of points is divided by 3 and assigned as

513: marks to each of the three points. For any individual point, all

514: these marks are summed together. For $DDR$, the contribution by each

515: triplet is

516: divided by 2 and assigned to the two real

517: data points. Bootstrap proceeds by resampling the real data points and

518: the values of $DDD^*$ and $DDR^*$ found by

519: adding the marks of the resampled points.

520: Substituting these into

521: (\ref{eqn:3ptest}) gives the bootstrap estimate. Other similar

522: estimators, such as the three-point

523: estimator of \citet{jing98} or the $N$-point estimators of

524: \citet{szapudi98}, can be bootstrapped in the same way.

525:

526:

527: \section{Simulation study}

528: \label{sect:simstudy}

529:

530: We performed a simple simulation study to compare the performance of

531: confidence intervals obtained using the marked point bootstrap with

532: those obtained using normal approximations with Poisson errors,

533: varying the observation region size, number density and point process

534: model. For computational simplicity, we restrict to two dimensions.

535: We also performed an additional study with a large observation region

536: and approximately 50,000 points, showing the applicability of the

537: marked point bootstrap to datasets of size comparable to current

538: astronomy datasets.

539:

540: We used the Poisson point process model and a Neyman-Scott model to

541: generate the data points. The Neyman-Scott model is of historical

542: interest in astronomy as a model for galaxies \citep{neyman52}. It is

543: still commonly

544: used to model point data in other fields \citep{diggle03, waag07}. We

545: chose the Neyman-Scott model as it is a model for clustered data with

546: closed-form expressions for the two-point correlation

547: function. The Neyman-Scott point datasets that we used are generated as

548: follows: parent points are distributed as a Poisson point

549: process with intensity $\lambda_p$. A Poisson number

550: with mean $m$ of offspring points are then randomly scattered

551: about each parent point. The collection of offspring points form the

552: point process. We set the dispersion function of offspring points

553: about parent points to be a bivariate normal density centered at the parent

554: point, with standard deviation $\sigma$. This specific Neyman-Scott

555: model is sometimes referred to as the modified Thomas model

556: \citep{stoyan95}. The

557: two-point correlation

558: function, $\xi(r)$, is zero for the

559: Poisson model, while

560: $$\xi(r) = \frac{1}{4\pi \lambda_p \sigma^2}\exp\left\{

561: -\frac{r^2}{4\sigma^2} \right\}$$ for the modified Thomas model.

562: Thus the point pattern from a modified Thomas model is more clustered

563: if $\lambda_p$ or $\sigma$ is smaller. The quantity $\sigma$ also

564: controls the range of the correlation, with the range larger for

565: larger values of $\sigma$. We used several

566: values for $\lambda_p, m$ and $\sigma$ in our simulation study.

567:

568: For each point process model, we generated 500 realizations on the

569: unit square. For each

570: realization, we estimated $\xi(r)$ for $r=0.01, \ldots ,

571: 0.1$. Bootstrap estimates were then produced from each realization

572: and a nominal 95\% confidence interval constructed. Thus for each

573: point process model, we have 500 95\% confidence intervals. We then

574: checked the the empirical coverage, i.e.\ the proportion of these that

575: contained the true value of

576: $\xi(r)$, with proportion closer to 95\% being desirable. We also

577: constructed 500 confidence intervals using the normal approximation with

578: Poisson errors. The Poisson error $e_p$ is the inverse of the pair

579: counts for an uncorrelated data set of the same size as the actual data, as

580: given by \citet{landy93}.  The

581: 95\% confidence intervals for $\xi$ based on the normal approximation

582: are thus given by $(\hat{\xi}\pm

583: 2e_p$). We then found the empirical coverage of these confidence

584: intervals. We then repeated the procedure for the $2\times 2$ and

585: $4\times 4$ squares. The results are summarized in Figures

586: \ref{fig:HamPoiCoverage} to \ref{fig:Booterrors}.

587:

588: Figure \ref{fig:HamPoiCoverage} shows plots of the empirical coverage

589: of nominal 95\% confidence intervals of the two-point correlation

590: function for the Poisson process model, using the \citet{hamilton93}

591: estimator. Simulation results for the other estimators are similar and are not

592: shown. The thick solid lines in the plots show the empirical coverage

593: of confidence intervals obtained using normal approximation with

594: Poisson errors. Note that Poisson errors are correct in this case and

595: we find that the empirical coverage is close to 95\% for all the point

596: densities and observation region sizes considered.

597:

598: The thin lines represent the empirical coverage of confidence

599: intervals obtained from the marked point bootstrap, with the different

600: line types representing different resampling block sizes. These were

601: squares of

602: lengths 0.5, 0.33 and 0.25 for the 1 by 1 regions, of lengths 1,

603: 0.67, 0.5, 0.33 for the 2 by 2 regions and of lengths 2, 1, 0.67,

604: 0.5 for the 4 by 4 regions (solid, dashed, dotted and dashed-dotted

605: lines respectively for increasingly smaller blocks). The difference

606: due to the block size used for resampling appear to be small. As

607: mentioned, this was an advantage of the marked point

608: bootstrap. \citet{loh02a} found greater variation of performance with

609: block size for subsampling and block bootstrap.

610:

611: Compared with the Poisson empirical coverage, we find that at low

612: densities and smaller observation region sizes (plots towards the

613: upper left of Figure \ref{fig:HamPoiCoverage}), the bootstrap method

614: does poorly. However, the empirical coverage of the bootstrap

615: confidence intervals quickly increases towards 95\% with increasing

616: density (down the columns in Figure \ref{fig:HamPoiCoverage}) and/or

617: observation region size (across the rows in Figure

618: \ref{fig:HamPoiCoverage}), i.e. with larger sample sizes.

619:

620: \clearpage

621: \begin{figure}

622: \begin{center}

623: \plotone{f1.eps}

624: \caption{Plots of the empirical coverage of nominal 95\% confidence

625:   intervals of the two-point correlation function for the Poisson

626:   point process model. The estimator used is that of

627:   \citet{hamilton93}. Confidence intervals are obtained using normal

628:   approximation with Poisson errors (thick solid line) and with the

629:   marked point bootstrap using different resampling block sizes.

630:  The block sizes were squares of

631: lengths 0.5, 0.33 and 0.25 for the 1 by 1 regions, of lengths 1,

632: 0.67, 0.5, 0.33 for the 2 by 2 regions and of lengths 2, 1, 0.67,

633: 0.5 for the 4 by 4 regions (solid, dashed, dotted and dashed-dotted

634: lines respectively for increasingly smaller blocks).

635: }

636: \label{fig:HamPoiCoverage}

637: \end{center}

638: \end{figure}

639: \clearpage

640:

641: \clearpage

642: \begin{figure}

643: \begin{center}

644: \plotone{f2.eps}

645: \caption{Plots of the empirical coverage of nominal 95\% confidence

646:   intervals of the two-point correlation function for the modified

647:   Thomas process model for realizations in a $2\times 2$ square. The

648:   estimator used is that of

649:   \citet{hamilton93}. Confidence intervals are obtained using normal

650:   approximation with Poisson errors (thick solid line) and with the

651:   marked point bootstrap using different resampling block sizes (see text).}

652: \label{fig:HamTomCoverage}

653: \end{center}

654: \end{figure}

655: \clearpage

656:

657:

658:

659: \clearpage

660: \begin{figure}

661: \begin{center}

662: \plotone{f3.eps}

663: \caption{Plots showing the true (solid), Poisson (dotted) and

664:   bootstrap (dashed) errors in estimates of $\xi$ for 500 sets

665:   of data simulated in a 2 by 2 square region using each of various

666:   point models. The true errors are obtained from the variability in

667:   the estimates of $\xi$ over the 500 data sets. For each data set,

668:   Poisson and bootstrap errors are computed. The errors shown in the

669:   plots are the average over the 500 data sets.}

670: \label{fig:Booterrors}

671: \end{center}

672: \end{figure}

673: \clearpage

674:

675:

676:

677: \clearpage

678: \begin{figure}

679: \begin{center}

680: \caption{Plots showing sample realizations of the modified Thomas

681:   model corresponding to four different sets of parameter values,

682:   simulated on a $20\times 20$ square. The degree of clustering is

683:   higher in the top row, while the range of

684: clustering is larger in the right column.}

685: \label{fig:r20Thomas}

686: \end{center}

687: \end{figure}

688:

689: \begin{figure}

690: \begin{center}

691: \plotone{f5.eps}

692: \caption{Plots of the coverage (left) and errors (right) for the

693:   bootstrap (solid) and Poisson (dashed)

694:   methods, based on 100

695:   simulated realizations from the Thomas model on the

696:   $20\times 20$ square. The thick solid lines in the plots on the

697:   right column represent the true errors.}

698: \label{fig:r20simulation}

699: \end{center}

700: \end{figure}

701: \clearpage

702:

703: The top left plot of Figure \ref{fig:Booterrors} shows the Poisson

704: errors and bootstrap errors for the Poisson point process model

705: simulated on the $2\times 2$ square. The bootstrap errors shown in

706: this figure are from resampling with $0.33 \times 0.33$ squares. Also

707: shown in the plot are the true errors

708: as obtained from the estimates from 500 realizations. Notice that both

709: the Poisson and bootstrap errors are close to the true errors.

710:

711:

712: Figure \ref{fig:HamTomCoverage} shows similar plots for various

713: modified Thomas models, each with number density 500.

714: The general behavior with increasing

715: observation region size for the Poisson model occurs here as

716: well. Thus to reduce the number of plots, we only include plots for the 2

717: by 2 observation regions, and show the relative performance of the

718: Poisson and bootstrap confidence intervals.

719:

720:

721:

722: We find that when the point pattern is only weakly clustered (left

723: plot, for the case $\lambda_p=100, m=2.5$ and $\sigma=0.15$),

724: the Poisson confidence intervals had empirical coverage close to the

725: nominal 95\% level. However, as the other two plots in Figure

726: \ref{fig:HamTomCoverage} show, the empirical coverages of the Poisson

727: confidence intervals become lower than 95\% as the degree and/or range

728: of clustering increases (i.e.\ with smaller $\sigma$ or $\lambda_p$).

729: On the other hand, the boostrap confidence intervals attain coverage

730: much closer to 95\% for all the cases shown, regardless of the degree

731: of clustering. Plots of the Poisson and bootstrap error estimates are

732: shown in Figure \ref{fig:Booterrors}. Notice that the Poisson

733: approximation underestimates the true error as the degree of

734: clustering increases, while the bootstrap error estimates remain close

735: to the true errors, even for the modified Thomas model.

736:

737:

738:

739: Thus we find that the performance of the Poisson confidence intervals

740: is sensitive to the degree of clustering of the point pattern. If the point

741: pattern is Poisson, or weakly clustered, the empirical coverage of

742: Poisson confidence intervals is

743: close to the nominal level, even with small sample sizes. However,

744: performance quickly deteriorates with greater degree of clustering.

745: On the other hand, the bootstrap confidence interval does not perform

746: well with small sample sizes. With moderate sample

747: sizes, however, the bootstrap method performs rather well, over a wide

748: range in the degree of clustering.

749:

750: We performed an additional set of simulations using data sets of

751: roughly 50,000 points in a  $20\times 20$

752: square and estimating $\xi(r)$ for $r= 0.01$ to 2. Other than the

753: restriction to 2D, the data size

754: and range of $r$ is roughly of the scale found in current astronomy

755: data. We used the modified Thomas model and chose four sets of

756: parameter values, varying the degree and range of clustering but

757: with the same number density. A sample

758: realization from each of the four models is shown in Figure

759: \ref{fig:r20Thomas}. The models corresponding to the top row in Figure

760: \ref{fig:r20Thomas} have higher clustering than the models on the

761: bottom row. For models in the same row, the strength of clustering is

762: similar, but the model on the right has a longer correlation range.

763: We used square resampling blocks of side length 5, 2.5 and 2 and

764: results were very similar.

765:

766: The results are summarized in Figure \ref{fig:r20simulation}, which

767: show the empirical coverage of confidence intervals (left column) and

768: errors (right column) obtained from the marked point bootstrap and

769: with Poisson errors, for each of the four Thomas models.

770: The plots qualitatively show the same relative

771: performance between Poisson errors and bootstrap as found in the

772: earlier simulation study. When the range of

773: clustering is large, the empirical coverage of confidence intervals

774: based on Poisson errors and the normal approximation is very low

775: (second and fourth plots on the left column of Figure

776: \ref{fig:r20simulation}). The

777: coverage of the bootstrap intervals are affected too, but by much less.

778: When the correlation is large, the Poisson errors substantially

779: under-estimate the true errors, while the marked bootstrap errors were

780: more realistic. At the larger values of $r$, especially when $\xi$ is

781: near 0, the bootstrap procedure appears to be somewhat conservative,

782: while the Poisson errors become more accurate.

783:

784: We made some time

785: measurements of various sections of the algorithm: the functions

786: computing $DD$ and $DR$ took 1 minute and 13 minutes

787: respectively. Here, $N_R = 200,000$ and we did not use any

788: sophisticated methods (such as tree-based algorithms) to speed up the

789: computation. The bootstrap function, generating 999 samples and

790: computing the estimates, took roughly 1 minute, showing the

791: feasibility of the procedure for large data sets.  The speed of the

792: marked point bootstrap is due to the fact that the marks that are

793: resampled have already been computed as part of the estimation. The

794: additional computational burden of the bootstrap is sampling the

795: points and keeping track of the number of times each point is

796: resampled.

797:

798:

799: \section{Discussion}

800: \label{sect:discussion}

801:

802: In this paper, we introduced the marked point bootstrap as a

803: method to bootstrap spatial data for estimating errors without

804: specific model assumptions. In particular,

805: we described how the method can be applied to estimators of the two-

806: and three-point correlation functions.

807: With the non-parametric bootstrap, errors are obtained from the actual

808: data. There is no need choose a model, select parameter values or

809: generate mock catalogs using $N$-body simulations. Thus errors

810: obtained from non-parametric bootstrap can be used to compare with

811: errors obtained from other methods with more specific

812: model assumptions.

813:

814: For non-parametric spatial bootstrap, we propose the marked point bootstrap

815: over the more common block bootstrap or subsampling methods. There are

816: several advantages of the marked point bootstrap.

817: Firstly, by using information from actual pairs

818: or triplets of points in the data, bootstrap confidence intervals

819: using the marked point bootstrap attain better empirical coverage than

820: confidence intervals constructed using block bootstrap (see

821: \citealt{loh02a} for a comparison of these two methods).

822:

823: Secondly, in

824: the marked point bootstrap, it is the marks that are used to compute

825: the bootstrap estimate. Thus the resampled points do not have to be

826: arranged to form a new point pattern. This makes it a lot easier to

827: bootstrap data that are observed in irregularly shaped regions that are

828: common in astronomy. In \citet{loh02a} for example, bootstrap on an

829: absorber catalog was done using slices as well as spheres, with

830: similar results for both types of resampling regions.

831:

832: Thirdly, the marks used for resampling are part of the original

833: estimate and are computed during the estimation

834: step. The only additional

835: computation required by the marked point bootstrap  involves selecting

836: points (that is, testing whether each point lies in a resampling region

837: or not), and keeping track of the number of times each point is

838: resampled. Unlike the block bootstrap, there is no need to

839: re-compute from scratch the estimates for each bootstrap sample.

840: This difference in computation is even greater for higher-order

841: statistics. These properties of the marked point bootstrap make it a

842: computationally feasible tool for analysis.

843:

844: Our study here suggests that

845: non-parametric bootstrap can yield valid estimates of errors under a

846: wide range of point patterns. The lack of specific model assumptions

847: means that the non-parametric bootstrap method, and in particular the

848: marked point bootstrap, can serve as an alternate and complementary method for

849: quantifying errors. Having estimates of errors obtained using Poisson

850: approximations, parametric and non-parametric bootstrap allows one to

851: have a better sense of the size of errors involved in an analysis.

852:

853: The simulation study performed here shows that bootstrap confidence

854: intervals do attain coverage close

855: to the nominal level, even for the clustered point patterns

856: where Poisson errors are known to be inaccurate, when sample sizes are

857: large. More specifically, bootstrap performance improves with

858: increasing number density, and also with increasing observation region

859: size relative to the correlation length.

860: Unfortunately, in astronomy, the correlation length may be of the same

861: scale as the observation region. If the values of $r$ at which the

862: correlation function estimates are computed are small relative to the

863: resampling blocks (and the observation region), then although the

864: bootstrap procedure would distort the dependence structure at the

865: large scales, it would still be valid for these smaller values of $r$.

866:

867: If, instead,

868: $\xi(r)$, say, for $r$ close to the size of the observation region

869: is of interest, then the bootstrap procedure would start to

870: break down, in the sense that the empirical coverage of confidence may

871: not be close to the nominal level, and the bootstrap errors not

872: reflect the true errors. In this case, the amount of information

873: contained in the data is smaller and the boundary effects are

874: magnified. With respect to the marked point bootstrap, larger blocks

875: are needed to capture the dependence structure at this larger

876: scale. For a fixed sample,

877: this cannot be done without reducing the variability of the bootstrap samples.

878: The method that might work best is parametric bootstrap, assuming that

879: the model is correct, and that the parameter values used are close to

880: the true values.

881: Non-parametric bootstrap can still be useful here. Firstly, it is at

882: least a better choice than Poisson errors, since the latter would

883: grossly underestimate the true errors. Secondly, it can provide additional

884: error estimates to compare with the errors obtained with the assumed

885: model. For these most challenging instances, having a variety of

886: methods can only be beneficial.

887:

888:

889:

890:

891:

892: \acknowledgments

893:

894: This research is supported in part by

895: National Science Foundation award AST-0507687.

896:

897:

898: \begin{thebibliography}{30}

899: \expandafter\ifx\csname natexlab\endcsname\relax\def\natexlab#1{#1}\fi

900:

901: \bibitem[{B{\"u}hlmann \& K{\"u}nsch(1999)}]{buhlmann99}

902: B{\"u}hlmann, P., \& K{\"u}nsch, H.~R. 1999, Computational Statistics and Data

903:   Analysis, 31, 295

904:

905: \bibitem[{Davis \& Peebles(1983)}]{davis83}

906: Davis, M., \& Peebles, P. J.~E. 1983, Astrophysical Journal, 267, 465

907:

908: \bibitem[{Davison \& Hinkley(1997)}]{davison97}

909: Davison, A.~C., \& Hinkley, D.~V. 1997, Bootstrap Methods and their

910:   Applications (Cambridge: Cambridge University Press)

911:

912: \bibitem[{Diggle(2003)}]{diggle03}

913: Diggle, P.~J. 2003, Statistical Analysis of Spatial Point Patterns, 2nd edn.

914:   (London: Arnold)

915:

916: \bibitem[{Efron \& Tibshirani(1994)}]{efron94}

917: Efron, B., \& Tibshirani, R. 1994, An Introduction to the Bootstrap (New York:

918:   Chapman and Hall/CRC)

919:

920: \bibitem[{Eisenstein {et~al.}(2005)Eisenstein, Zehavi, Hogg, \&

921:   Scoccimarro}]{eisenstein05}

922: Eisenstein, D.~J., Zehavi, I., Hogg, D.~W., \& Scoccimarro, R. 2005,

923:   Astrophysical Journal, 633, 560

924:

925: \bibitem[{Hall(1985)}]{hall85}

926: Hall, P. 1985, Stochastic Processes and their Applications, 20, 231

927:

928: \bibitem[{Hamilton(1993)}]{hamilton93}

929: Hamilton, A. J.~S. 1993, Astrophysical Journal, 417, 19

930:

931: \bibitem[{Hewett(1982)}]{hewett82}

932: Hewett, P.~C. 1982, Astronomical Journal, 201, 867

933:

934: \bibitem[{Jing \& B\"{o}rner(1998)}]{jing98}

935: Jing, Y.~P., \& B\"{o}rner, G. 1998, Astrophysical Journal, 503, 37

936:

937: \bibitem[{Kemball \& Martinsek(2005)}]{kemball05}

938: Kemball, A., \& Martinsek, A. 2005, Astronomical Journal, 129, 1760

939:

940: \bibitem[{Kerscher {et~al.}(2000)Kerscher, Szapudi, \& Szalay}]{kerscher2000}

941: Kerscher, M., Szapudi, I., \& Szalay, A.~S. 2000, Astrophysical Journal

942:   Letters, 535, 13

943:

944: \bibitem[{Kulkarni {et~al.}(2007)Kulkarni, Nichol, Sheth, Seo, Eisenstein, \&

945:   Gray}]{kulkarni07}

946: Kulkarni, G.~V., Nichol, R.~C., Sheth, R.~K., Seo, H.-J., Eisenstein, D.~J., \&

947:   Gray, A. 2007, Monthly Notices of the Royal Astronomical Society, 378, 1196

948:

949: \bibitem[{K{\"u}nsch(1989)}]{kunsch89}

950: K{\"u}nsch, H.~R. 1989, Annals of Statistics, 17, 1217

951:

952: \bibitem[{Lahiri(2003)}]{lahiri03}

953: Lahiri, S.~N. 2003, Resampling Methods for Dependent Data (New York: Springer)

954:

955: \bibitem[{Landy \& Szalay(1993)}]{landy93}

956: Landy, S.~L., \& Szalay, A.~S. 1993, Astrophysical Journal, 412, 64

957:

958: \bibitem[{Liu \& Singh(1992)}]{liu92}

959: Liu, R.~Y., \& Singh, K. 1992, in Exploring the Limits of Bootstrap, ed.

960:   R.~LePage \& L.~Billard (New York: Wiley), 225--248

961:

962: \bibitem[{Loh \& Stein(2004)}]{loh02a}

963: Loh, J.~M., \& Stein, M.~L. 2004, Statistica Sinica, 14, 69

964:

965: \bibitem[{{McCracken} {et~al.}(2007){McCracken}, Peacock, Guzzo, Capak,

966:   Porciani, Scoville, Aussel, Finoguenov, James, Kitzbichler, Koekemoer,

967:   Leauthaud, {Le F\`{e}vre}, Massey, Mellier, Mobasher, Norberg, Rhodes,

968:   Sanders, Sasaki, Taniguchi, Thompson, White, \& {El-Zant}}]{mccracken07}

969: {McCracken}, H.~J., Peacock, J.~A., Guzzo, L., Capak, P., Porciani, C.,

970:   Scoville, N., Aussel, H., Finoguenov, A., James, J.~B., Kitzbichler, M.~G.,

971:   Koekemoer, A., Leauthaud, A., {Le F\`{e}vre}, O., Massey, R., Mellier, Y.,

972:   Mobasher, B., Norberg, P., Rhodes, J., Sanders, D.~B., Sasaki, S.~S.,

973:   Taniguchi, Y., Thompson, D.~J., White, S. D.~M., \& {El-Zant}, A. 2007,

974:   Astrophysical Journal Supplement, 172, 314

975:

976: \bibitem[{Neyman \& Scott(1952)}]{neyman52}

977: Neyman, J., \& Scott, E.~L. 1952, Astrophysical Journal, 116, 144

978:

979: \bibitem[{Peebles \& Groth(1975)}]{peebles75}

980: Peebles, P. J.~E., \& Groth, E.~J. 1975, Astrophysical Journal, 196, 1

981:

982: \bibitem[{Politis \& Romano(1993)}]{politis93a}

983: Politis, D.~N., \& Romano, J.~P. 1993, Journal of Multivariate Analysis, 47,

984:   301

985:

986: \bibitem[{Politis {et~al.}(1999)Politis, Romano, \& Wolf}]{politis99}

987: Politis, D.~N., Romano, J.~P., \& Wolf, M. 1999, Subsampling (Berlin: Springer)

988:

989: \bibitem[{Ripley(1988)}]{ripley88}

990: Ripley, B.~D. 1988, Statistical Inference for Spatial Processes (New York:

991:   Wiley)

992:

993: \bibitem[{Shen {et~al.}(2007)Shen, Strauss, Oguri, \& et~al.}]{shen07}

994: Shen, Y., Strauss, M.~A., Oguri, M., \& et~al. 2007, Astronomical Journal, 133,

995:   2222

996:

997: \bibitem[{Simpson \& Mayer-Hasselwander(1986)}]{simpson86}

998: Simpson, G., \& Mayer-Hasselwander, H. 1986, Astronomy and Astrophysics, 162,

999:   340

1000:

1001: \bibitem[{Snethlage(1999)}]{snethlage99}

1002: Snethlage, M. 1999, Metrika, 49, 245

1003:

1004: \bibitem[{Stoyan {et~al.}(1995)Stoyan, Kendall, \& Mecke}]{stoyan95}

1005: Stoyan, D., Kendall, W.~S., \& Mecke, J. 1995, Stochastic Geometry and Its

1006:   Applications, \textnormal{2nd edition} (New York: John Wiley)

1007:

1008: \bibitem[{Szapudi \& Szalay(1998)}]{szapudi98}

1009: Szapudi, I., \& Szalay, A.~S. 1998, Astrophysical Journal, 494, L41

1010:

1011: \bibitem[{Waagepetersen(2007)}]{waag07}

1012: Waagepetersen, R. 2007, Biometrics, 63, 252

1013:

1014: \end{thebibliography}

1015:

1016:

1017:

1018: \end{document}

1019:

1020: