1:
2: \documentclass[12pt,preprint]{aastex}
3: %\usepackage{amsfonts}
4: %\usepackage{amssymb}
5: %\usepackage{epsfig}
6: %\usepackage[round]{natbib}
7:
8: %\setlength{\topmargin}{5bp}
9: %\setlength{\topskip}{0in}
10: %\setlength{\headsep}{0pt}
11: %\setlength{\headheight}{0pt}
12: %\setlength{\textwidth}{6.4in}
13: %\setlength{\textheight}{8in}
14: %\setlength{\footskip}{0.25in}
15: %\setlength{\oddsidemargin}{5bp}
16: %\setlength{\evensidemargin}{5bp}
17: %\renewcommand{\baselinestretch}{1.66}
18: %\setlength{\emergencystretch}{2em} % Add a little slop
19: %\DeclareMathSizes{12}{12}{10}{10} % Make large super/subscripts
20: %\setlength{\footnotesep}{0.6cm}
21:
22:
23: %\renewcommand{\baselinestretch}{2}
24: %\tolerance=500
25:
26:
27: \begin{document}
28: %
29:
30: %\bibliographystyle{apj}
31: %\bibliographystyle{elsart-harv}
32:
33: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
34:
35: \title{A valid and fast spatial bootstrap for correlation functions}
36: \shorttitle{Spatial bootstrap for correlation functions}
37: \author{Ji Meng Loh}
38:
39: \affil{Department of Statistics, Columbia University, New York, 10027,
40: USA}
41: \email{meng@stat.columbia.edu}
42:
43:
44: \begin{abstract}
45:
46: In this paper, we examine the validity of non-parametric spatial
47: bootstrap as a procedure to
48: quantify errors in estimates of $N$-point correlation functions.
49: We do this by means of a small simulation study with simple point
50: process models and estimating the two-point correlation
51: functions and their errors. The coverage of confidence intervals
52: obtained using bootstrap is compared with those obtained from assuming
53: Poisson errors. The bootstrap procedure considered here is adapted for
54: use with spatial (i.e.\ dependent) data. In particular, we describe a
55: marked point bootstrap where, instead of resampling points or blocks
56: of points, we resample marks assigned to the data points. These marks
57: are numerical values that are based on the statistic of interest.
58: We describe how the marks are defined for the two- and three-point
59: correlation functions. By
60: resampling marks, the bootstrap samples retain more of the dependence
61: structure present in the data. Furthermore, this method of bootstrap
62: can be performed much quicker than some other bootstrap methods for
63: spatial data, making it a more practical method with large datasets.
64: We find that with clustered point datasets, confidence intervals
65: obtained using the marked point bootstrap has empirical coverage
66: closer to the nominal level than the confidence intervals obtained
67: using Poisson errors. The bootstrap errors were also found to be
68: closer to the true errors for the clustered point datasets.
69:
70:
71:
72: \end{abstract}
73:
74: \keywords{methods:statistical}
75:
76: \section{Introduction} \label{sect:intro}
77:
78: In analyses of survey data, such as those of galaxies or quasars,
79: $N$-point correlation functions are often
80: estimated \citep[e.g.][]{kulkarni07, mccracken07, shen07}. These help to describe the structure of the observed
81: objects, such as the filamentary structure of galaxies that
82: has been observed, and to constrain the parameters of cosmological
83: models. Such estimates are only as important as their associated errors,
84: since it is the errors that indicate
85: the amount of agreement between two sets of data or between data and
86: simulations from a model.
87:
88: In spatial point processes, the expressions for the standard errors of
89: correlation
90: (and similar) functions have been worked out only for the simplest of
91: models. For example, \citet{ripley88} found approximations for the
92: variance of the $K$ function, an integral of the two-point correlation
93: function, for the Poisson process. These depend on such factors as the
94: shape of the observation region and the type of correction method for
95: boundary effects. \citet{landy93}, \citet{hamilton93} and others
96: worked out approximations for standard errors of various estimators of
97: the two-point correlation function under the Poisson or weakly
98: clustered models. Thus, very often, approximations such as Poisson
99: errors are used instead. However,
100: point data arising in astronomy are typically clustered and
101: non-Poisson. So while Poisson errors are useful and easy to compute,
102: they only serve as rough indications of the size of errors.
103:
104: Besides using Poisson errors, errors can also be estimated by using
105: mock catalogs generated from a cosmological model. This method was
106: employed in \citet{eisenstein05}, where the initial conditions for the
107: cosmological model were selected independently. In the statistics
108: literature, this is referred to as parametric bootstrap, although the term
109: more commonly refers to mock datasets generated from the model using
110: parameter values fixed at the estimates from the data, instead of
111: being independently chosen.
112:
113: An alternative is to use non-parametric bootstrap. This
114: involves generating new samples, called bootstrap samples, by
115: resampling from the actual data, and computing estimates for these new
116: samples. The distribution of these bootstrap estimates then serves as
117: a proxy for the actual distribution of the data estimates, so that
118: statistical inference, such as the construction of confidence
119: intervals, can be performed. Note that the procedure does not make any
120: specific model assumptions, thus the errors obtained by this method
121: can serve as a check of model assumptions.
122: Due to the simplicity and flexibility of the non-parametric bootstrap,
123: the method is attractive. What is desirable then is to make the
124: non-parametric bootstrap procedure work as well as possible for data
125: that is correlated, and check that it performs satisfactorily, so that
126: it can be useful as a tool in analysis.
127:
128: This paper thus examines the non-parametric bootstrap, specifically
129: bootstrap of spatial data, where the dependence present in the data is
130: of interest. There were some early misconceptions about how bootstrap
131: should be applied to spatial data. The naive method of resampling
132: individual points does not work in the spatial context.
133: In order for spatial bootstrap to be valid, the underlying
134: dependence structure has to be preserved as much as possible when
135: generating bootstrap samples. Two common methods for doing this are the block
136: bootstrap and subsampling where blocks of data, instead of individual
137: data points, are resampled. We introduce these methods
138: in Section \ref{sect:bootspatial} and describe their shortcomings.
139: In Section \ref{sect:improve} we describe the marked point bootstrap
140: \citep{loh02a}
141: as a way to address these shortcomings. We describe how the marked
142: point bootstrap can be used with the two- and three-point correlation
143: function estimators, and by extension to estimators of $N$-point
144: correlation functions.
145: In Section \ref{sect:simstudy} we present results of a simulation
146: study using simple point process models comparing the empirical
147: coverage of confidence intervals obtained using non-parametric
148: bootstrap and using normal approximations with Poisson errors.
149:
150: In this paper, we restrict ourselves to constructing nominal 95\%
151: confidence intervals, i.e. these confidence intervals are supposed to
152: contain the true value 95\% of the time. The empirical coverage of the
153: confidence intervals is the actual confidence level achieved by the
154: confidence intervals. In a simulation study with a known model, the
155: empirical coverage can be obtained by finding the number of confidence
156: intervals that contain the true value and then compared with the nominal
157: level. It is desirable, of course, for the empirical coverage to be
158: close to the nominal level. Furthermore, it is often better for the
159: empirical coverage to be higher instead of lower than the nominal
160: level, so that the procedure is conservative.
161:
162: Bootstrap is a computationally intensive procedure. With the large
163: datasets now common in astronomy, even computing the $N$-point
164: correlation functions pose computational challenges. For example,
165: \citet{eisenstein05} avoided using the jackknife procedure for error
166: estimation because of the size of the data they used.
167: The way the
168: marked point bootstrap is formulated, however, makes it much faster than
169: subsampling (a generalization of the jackknife) and the
170: block bootstrap, so that applying the procedure to large datasets is
171: feasible as long as computing the actual estimates is
172: feasible. In Section \ref{sect:simstudy} we provide some
173: time measurements of the procedure used in our simulation study.
174:
175:
176: \section{Non-parametric bootstrap for spatial data}
177: \label{sect:bootspatial}
178:
179: The non-parametric bootstrap was originally developed for independent data
180: \citep{efron94}. The main idea
181: is to draw new samples from the actual data by sampling with
182: replacement a data point at a time. Bootstrap estimates of the same
183: statistic are computed from the bootstrap samples. With these
184: bootstrap estimates, confidence intervals, for example, can then be
185: constructed. This can be done in a variety of ways. Suppose $K$, $\hat{K}$ and
186: $\hat{K}^*_i, i=1, \ldots B$ are respectively the quantity of
187: interest, the estimate of $K$ computed from the data and the bootstrap
188: estimates, with $B$ equal to the number of bootstrap samples.
189: A simple method, called the basic
190: bootstrap interval in \citet{davison97}, is to set
191: \begin{eqnarray}
192: & [2\hat{K}-\hat{K}^*_{(B+1)(1-\alpha/2)},
193: 2\hat{K}-\hat{K}^*_{(B+1)\alpha/2} ] & \label{eqn:basicCI}
194: \end{eqnarray}
195: as the $100(1-\alpha)$\% confidence interval for $K$. Here, $B$, the number
196: of bootstrap samples, is large, say, 999, and $\hat{K}^*_u$ is
197: the $u$-th ordered values of the bootstrap statistic. So, for example, with
198: $B=999$ bootstrap samples, a 95\% confidence interval for $K$ is given
199: by $[2\hat{K}-\hat{K}^*_{975},
200: 2\hat{K}-\hat{K}^*_{25} ]$. In our simulation studies, we use
201: (\ref{eqn:basicCI}) to construct the confidence intervals. Standard
202: errors for $\hat{K}$ are estimated by the standard deviation of the
203: bootstrap estimates $\hat{K}^*$.
204:
205: While there are other methods of constructing confidence intervals
206: from bootstrap samples \citep[see][for example]{davison97}, the interest here is in the method of
207: generating the bootstrap samples, when the data is
208: spatial. \citet{snethlage99} rightly concludes that resampling
209: individual points do not work. If the resampled points are placed in
210: their original positions in the observation region, there will be
211: multiple points at single locations, which do not usually occur in most
212: data sets. In their claim that bootstrap cannot be used for
213: analysis for clustering, \citet{simpson86} were also considering
214: bootstrap in terms of resampling individual points.
215:
216: Due to the success of bootstrap for resampling independent data, it
217: has been extended to resample dependent data. Most of this work is for
218: time series, but can easily be applied to spatial data in two and
219: three dimensions. A common method is the block bootstrap: blocks of
220: the spatial data are sampled at random, then joined together to form a new
221: sample \citep{hall85, kunsch89, liu92}. Asymptotic arguments for the
222: validity of the bootstrap involve
223: limiting the range of dependence, increasing the observation region size
224: and letting the resampling block size increase but at a slower rate
225: than the observation region size. In this asymptotic setting, we then
226: have many almost independent blocks of data, with each block itself
227: containing a large subsample \citep[see e.g.][]{lahiri03}.
228: However, the assumed conditions necessary to make the
229: calculations tractable also means that normal approximations work well too.
230: Some theoretical results show that the accuracy of bootstrap
231: estimates is of a higher order than the normal approximations. Whether
232: this difference is meaningful in actual practice is less clear.
233: We believe that the role of non-parametric bootstrap is to serve as another
234: objective method to obtain standard errors that do not make any model
235: assumptions. Error estimates obtained using bootstrap can be used as a
236: way to assess or compare with other estimates of
237: errors.
238:
239:
240: \citet{kemball05} is a recent work in the astronomy literature
241: that examined bootstrap for dependent data. For the non-parametric
242: bootstrap, they focused on
243: subsampling \citep{politis93a, politis99}, which can be considered
244: a generalization of the jackknife procedure. In subsampling,
245: random portions of the data are deleted, and the remaining data are
246: treated as bootstrap samples. The standard deviation of the estimates computed
247: from these samples serves as an estimate of the error, less a factor
248: to adjust for the smaller subsamples and the large overlap.
249:
250: When estimating correlation functions, pairs or triplets etc of points
251: have to be counted. By joining independently resampled blocks together to
252: form the bootstrap sample, the block bootstrap creates artifical
253: configurations of points across the resampling blocks and distorts the
254: dependence structure in the data. This does not matter in asymptotic
255: arguments because the effect becomes negligible if the range of the
256: correlation is fixed while the resampling blocks increase in
257: size. However, \citet{loh02a} found that the
258: actual coverage achieved by confidence intervals obtained using block
259: bootstrap can be much lower than the nominal percentage level for finite samples.
260:
261: In subsampling, no artificial configurations of points are
262: created. However, while the correction weight accounts for the
263: difference in sample sizes between the bootstrap samples and the
264: actual data set, it does not account for the change in the boundary
265: effects due to the different resampling regions. Since subsampling
266: uses smaller regions as the bootstrap observation regions, boundary
267: effects are magnified.
268: For subsampling, there is the temptation to use large
269: subsamples to try and retain more of the dependence structure, but
270: like block bootstrap, theoretical justification of the method requires
271: that the subsamples be small in size relative to the actual data set.
272: \citet{loh02a} also found that subsampling can
273: yield confidence intervals that attain very low empirical
274: coverage. They also found that the subsampling method is sensitive to
275: the fraction of the data used for subsampling.
276:
277: \citet{loh02a}
278: proposed another version of spatial bootstrap, called marked point
279: bootstrap, that reduces the effect of joining independent blocks and
280: produces confidence intervals that achieve coverage closer to the
281: nominal level. This is described in the next section, where we also
282: show how it can be applied to the two- and three-point correlation
283: function estimators commonly used in astronomy.
284:
285:
286:
287: \section{Improving the non-parametric bootstrap of spatial data}
288: \label{sect:improve}
289:
290: Suppose $N$ points are observed in a region $A$. Furthermore, suppose
291: that the quantity of interest $K$ can be estimated using an estimator
292: of the form
293: \begin{eqnarray}
294: \hat{K} & = & \frac{1}{N}\sum_{i=1}^N\sum_{j=1 \atop j\ne i}^N
295: \phi(x_i, x_j) \equiv \hat{\Phi}/N.
296: \label{eqn:estimator}
297: \end{eqnarray}
298: Note that each point $i$ has an associated quantity $\sum_{j=1, j\ne i}^N
299: \phi(x_i,x_j)$, the inner sum of equation (\ref{eqn:estimator}).
300: Estimators of two-point statistics can be expressed in this form. In
301: this case, the
302: quantity $\phi(x_i,x_j)$ will depend on the distance
303: between $x_i$ and $x_j$. As an
304: aside, note that estimators of three-point statistics
305: can be written in a similar form, with the inner sum replaced by a
306: double sum.
307:
308: With point data, the term ``mark'' is used to refer to some additional
309: information associated with a point. This is usually some actual
310: measured value. For galaxy data, for example, marks could be
311: quantities such as luminosity, color and so on. In this paper, the
312: bootstrap method considered uses marks associated with the
313: points. However, these marks are not quantities such as luminosity
314: that are directly measured. Instead they are numerical quantities that
315: we construct and associate with the points. The actual values of these
316: marks are not random, but are constructed so that they relate to the
317: statistic that is of interest. If the statistic of interest is given
318: by equation (\ref{eqn:estimator}), then the mark associated with point
319: $i$, denoted by $m_i$, is equal to $\sum_{j=1, j\ne i}^N
320: \phi(x_i,x_j)$, so that\ $\hat{\Phi} = \sum_i m_i$. At the risk of
321: being repetitive, suppose that
322: $\hat{\Phi} \equiv DD(r) = \sum_{x\in D} \sum_{y\in D:y\ne x} 1\{ |x-y| \in
323: (r-dr,r+dr)\}$, for some $r$, is of
324: interest. This quantity is used in estimators of the
325: two-point correlation function, and is the number of pairs of points
326: separated by (roughly) distance $r$. Then the mark associated with
327: point $x$ is $\sum_{y\in D:y\ne x} 1\{ |x-y| \in (r-dr,r+dr)\}$, the
328: number of points that are roughly distance $r$ away from $x$. Note
329: that the sum of all the marks gives back the value of $DD(r)$. It is
330: also important to note that to compute the estimate
331: (\ref{eqn:estimator}), the marks have to be calculated anyway.
332: In regular applications, the algorithm doing the estimation does not
333: individually record these marks, but keeps a running sum of the marks.
334: In order to do the marked point bootstrap, the
335: difference in terms of the code is that the marks now have to be
336: stored so that they can be used in the bootstrap step.
337:
338: In the block bootstrap, blocks of data
339: points are resampled and then joined together, forming a new dataset
340: from which $K$ is estimated using the new configuration of points that
341: was generated, yielding $\hat{K}^*$. In the marked point bootstrap,
342: blocks can be used to resample points as well. However, the crucial
343: difference is that the bootstrap estimate is computed, not from how
344: the resampled points are positioned, but from the marks that are associated
345: with these points. In other words, the marked point bootstrap
346: resamples the marks rather than the points and the bootstrap estimate is
347: computed by summing these resampled marks.
348:
349: To be more precise, suppose that $N^*$ number of points have been resampled,
350: with the resampled points denoted by $x_j^*, j=1, \ldots
351: N^*$. Associated with each $x_j^*$ is a mark $m_{j^*}$. We denote this
352: mark $m_{j^*}$ rather than $m_j^*$ to emphasize the fact that these
353: marks are sampled from the actual data, i.e.\ computed from the
354: original dataset and not from the bootstrap sample. Then the bootstrap
355: estimate of $K$ is given by the average of the resampled marks: $$\hat{K}^*
356: = \hat{\Phi}^*/N^* =
357: \sum_{j=1}^{N^*} m_{j^*}/N^*,$$ just like $\hat{K}$ is given by the average of
358: the actual marks. Note that in an actual implementation of the
359: procedure, all that is required is keeping track of how many times each
360: point is resampled.
361: The step-by-step procedure for estimating and resampling the
362: quantity (\ref{eqn:estimator}) is as follows:
363:
364: \begin{enumerate}
365: \item For each point $i$, calculate $m_i = \sum_{j=1, j\ne i}^N
366: \phi(x_i,x_j)$.
367: \item Obtain the estimate $\hat{K}$, using $\hat{K}= \sum_i m_i/N$.
368: \item Resample the points. This can be done by randomly placing blocks
369: on to the observation region and keeping track of which point is
370: resampled. Suppose point $i, i=1,\ldots , N$ is resampled $n^*_i$ times, and
371: $N^*=\sum_i n_i^*$.
372: \item The bootstrap estimate is then $\hat{K}^* = \sum_i (n^*_i \times
373: m_i)/N^*$
374: \item Repeat steps 3 and 4 to get $B$ bootstrap estimates.
375: \item Construct a confidence interval using (\ref{eqn:basicCI}).
376: \end{enumerate}
377:
378: A few remarks about the procedure are in order.
379:
380: \noindent {\bf Remark 1} Instead of randomly placing blocks, the observation region can
381: be divided into a number of subregions, and the regions selected
382: randomly with replacement. This latter method is sometimes referred
383: to as using fixed blocks as opposed to moving blocks. It
384: is generally considered that the moving blocks bootstrap works
385: better in terms of convergence rates in asymptotic arguments.
386:
387: \noindent {\bf Remark 2} The number of blocks used is so that the total area/volume of
388: the blocks is equal to the original area/volume of the observation
389: region. Note that in this case $N^*$ would usually not be equal to
390: $N$, though they will be of the same order of magnitude. However,
391: this does not pose problems since the statistic $\hat{K}$ is a mean
392: of the marks.
393:
394: \noindent {\bf Remark 3} There is no real consenus on the size of the resampling blocks
395: to use. \citet{buhlmann99} did some work on determining the optimal block size
396: from data. Intuitively, the procedure needs large blocks so that
397: the correlation structure is less distorted, and a large enough
398: number of blocks so that there is enough variability between
399: bootstrap samples. If $K$ represents the number of blocks and $N$
400: the number of data points (which is assumed to increase with the
401: observation region size), theoretical work in e.g.\ \citet{lahiri03}
402: suggests that consistency is
403: achieved as $K\to \infty$ and $N/K \to \infty$. Thus some trade-off
404: is needed. A rule-of-thumb is to divide each dimension
405: of the observation region into at least three parts, i.e.\ nine
406: blocks in 2D, 27 blocks in 3D. This would ensure enough variability
407: between bootstrap samples. Of course, for correlation functions, the
408: maximum value of the separation distance $r$ at which these
409: functions are estimated would influence the
410: decision on block size.
411: Fortunately, \citet{loh02a} found that the marked point bootstrap is less
412: sensitive to block size than block bootstrap or subsampling: they
413: resampled an absorber catalog using slices of the sphere and found that
414: the bootstrap errors were similar for different sizes of the
415: slices. Our simulation results also show little difference due to
416: different block size.
417:
418:
419:
420: There are a few advantages to this form of spatial bootstrap over the
421: regular block bootstrap. Since the bootstrap estimates are based on
422: the resampled marks and not on marks recalculated from the bootstrap
423: sample, the contribution to the bootstrap estimate is due to actual
424: pairs of points in the original dataset. This helps to minimize the
425: distortion of dependence structure in the dataset due to
426: resampling.
427:
428: Furthermore, for any block of resampled points,
429: information about the points just outside the block (and therefore not
430: sampled by this particular block) is captured by the marks associated
431: with the points that are sampled by the block. This helps to reduce
432: the variability of bootstrap results due to the size of the
433: resampling blocks, compared to block bootstrap or subsampling.
434: Also, since the resampling blocks do not need to be
435: joined together to form a contiguous region for the bootstrap sample,
436: there is flexibility in the choice of the shape of the resampling regions.
437:
438: Lastly, the marked point bootstrap can be performed relatively quickly
439: compared to block bootstrap or subsampling. The marks that are
440: associated with the points are
441: part of the actual estimator and are already computed in the
442: estimation step. Resampling using the marked point bootstrap only
443: involves identifying which points are resampled with each resampling
444: region, and keeping track of how many times each point is
445: resampled. Inter-point distances and edge correction weights do not
446: have to be recalculated. With $N$ data points and $B$ bootstrap
447: samples, block bootstrap will take roughly $BN^2$ computations for a
448: statistic involving pairs of points. The marked point bootstrap will
449: involve roughly $N^2 + BN$ computations. The difference will be more
450: marked for three-point computations.
451:
452: Simulation studies done in \citet{loh02a} showed that the empirical
453: coverages of confidence intervals obtained using the marked point
454: bootstrap can be much closer to the nominal 95\% level than those
455: obtained with block bootstrap or subsampling.
456:
457: We now describe how the marked point bootstrap can be used with
458: estimators of the two-point correlation function. The common
459: estimators of the two-point correlation function $\xi(r)$ are
460: \begin{eqnarray}
461: \hat{\xi}_{Nat}(r) & = & \frac{dd(r)}{rr(r)}-1, \label{eqn:nat} \\
462: \hat{\xi}_{DP}(r) & = & \frac{dd(r)}{dr(r)}-1, \label{eqn:dp} \\
463: \hat{\xi}_{Ham}(r) & = & \frac{dd(r)\cdot rr(r)}{dr(r)^2} -1, \label{eqn:ham}\\
464: \hat{\xi}_{Landy}(r) & = & \frac{dd(r)-2dr(r)}{rr(r)}+1, \label{eqn:landy} \\
465: \hat{\xi}_{Hewett}(r) & = & \frac{dd(r)-dr(r)}{rr(r)}, \label{eqn:hewett}
466: \end{eqnarray}
467: which are, respectively, the natural estimator \citep{kerscher2000}, and
468: estimators due to \citet{davis83, hamilton93, landy93, hewett82},
469: where $r$ is
470: some distance of interest. In these expressions, $dd(r) = DD(r)/N^2, dr(r) =
471: DR(r)/NN_R$ and $rr(r)=RR(r)/N_R^2$, where $DD(r)=\sum_{x\in D} \sum_{y\in D: y\ne
472: x} 1\{ |x-y| \in (r-dr, r+dr)\}/N^2$,
473: $DR(r)=\sum_{x\in D} \sum_{y\in R} 1\{ |x-y| \in (r-dr, r+dr)\}/NN_R$
474: and $RR(r)=\sum_{x\in R} \sum_{y\in R: y\ne x} 1\{ |x-y| \in (r-dr, r+dr)\}/N_R^2$,
475: $R$ is a set of randomly generated points
476: (i.e.\ Poisson) in the observation region $A$, and $N$ and $N_R$ are
477: respectively the number of points in the real and random data sets.
478:
479: To apply the marked point bootstrap, assign to
480: each point $x$ of the dataset marks $m_{x,1}=\sum_{y\in D: y\ne
481: x} 1\{ |x-y| \in (r-dr, r+dr)\}$ and $m_{x,2}=\sum_{y\in R} 1\{
482: |x-y| \in (r-dr, r+dr)\}$. Bootstrap proceeds by resampling blocks of
483: points and recording the marks associated with them. For a bootstrap
484: sample, $x^*_j, j=1, \ldots N^*$, we then have
485: $$DD^*(r) = \sum_{j=1}^{N^*} m_{x^*_j,1}, \qquad
486: DR^*(r) = \sum_{j=1}^{N^*} m_{x^*_j,2},$$
487: and bootstrap estimates of the two-point correlation functions are
488: then obtained by substituting the above into
489: (\ref{eqn:nat})-(\ref{eqn:hewett}).
490: If each point $x_i$ of the actual data is resampled $n_i^*$ times, so
491: that $N^* = \sum_i n_i^*$, $DD^*(r)$ and $DR^*(r)$ can also be written
492: as
493: $$DD^*(r) = \sum_{i=1}^N (n_i^* \times m_{x_i,1}), \qquad DR^*(r) =
494: \sum_{i=1}^N (n_i^* \times m_{x_i, 2}).$$
495:
496: Note that $RR$ does not need to be resampled, since it is used as an
497: approximation to an integral and has nothing to do with the actual
498: data. If, as is usually the case, estimation of $\xi(r)$ is needed for a
499: range of values of $r$, then the marks $m_{x,1}$ and $m_{x,2}$ would
500: be vectors, containing the relevant values for each value of $r$.
501:
502: Estimators of the three-point correlation function can be bootstrapped
503: in a similar way. For example, an estimator of the three-point
504: correlation function is
505: \begin{eqnarray}
506: \zeta & = & \frac{ddd-ddr}{rrr} + 2, \label{eqn:3ptest}
507: \end{eqnarray}
508: introduced by \citet{peebles75}, where $ddd = DDD/N^3, ddr =
509: DDR/N^2N_R$ and $rrr = RRR/N_R^3$ and $DDD, DDR, RRR$ are counts
510: of triplets of points with the desired configuration, $DDD$ with all
511: points from the real data set and so on. The contribution to $DDD$
512: by any particular triplet of points is divided by 3 and assigned as
513: marks to each of the three points. For any individual point, all
514: these marks are summed together. For $DDR$, the contribution by each
515: triplet is
516: divided by 2 and assigned to the two real
517: data points. Bootstrap proceeds by resampling the real data points and
518: the values of $DDD^*$ and $DDR^*$ found by
519: adding the marks of the resampled points.
520: Substituting these into
521: (\ref{eqn:3ptest}) gives the bootstrap estimate. Other similar
522: estimators, such as the three-point
523: estimator of \citet{jing98} or the $N$-point estimators of
524: \citet{szapudi98}, can be bootstrapped in the same way.
525:
526:
527: \section{Simulation study}
528: \label{sect:simstudy}
529:
530: We performed a simple simulation study to compare the performance of
531: confidence intervals obtained using the marked point bootstrap with
532: those obtained using normal approximations with Poisson errors,
533: varying the observation region size, number density and point process
534: model. For computational simplicity, we restrict to two dimensions.
535: We also performed an additional study with a large observation region
536: and approximately 50,000 points, showing the applicability of the
537: marked point bootstrap to datasets of size comparable to current
538: astronomy datasets.
539:
540: We used the Poisson point process model and a Neyman-Scott model to
541: generate the data points. The Neyman-Scott model is of historical
542: interest in astronomy as a model for galaxies \citep{neyman52}. It is
543: still commonly
544: used to model point data in other fields \citep{diggle03, waag07}. We
545: chose the Neyman-Scott model as it is a model for clustered data with
546: closed-form expressions for the two-point correlation
547: function. The Neyman-Scott point datasets that we used are generated as
548: follows: parent points are distributed as a Poisson point
549: process with intensity $\lambda_p$. A Poisson number
550: with mean $m$ of offspring points are then randomly scattered
551: about each parent point. The collection of offspring points form the
552: point process. We set the dispersion function of offspring points
553: about parent points to be a bivariate normal density centered at the parent
554: point, with standard deviation $\sigma$. This specific Neyman-Scott
555: model is sometimes referred to as the modified Thomas model
556: \citep{stoyan95}. The
557: two-point correlation
558: function, $\xi(r)$, is zero for the
559: Poisson model, while
560: $$\xi(r) = \frac{1}{4\pi \lambda_p \sigma^2}\exp\left\{
561: -\frac{r^2}{4\sigma^2} \right\}$$ for the modified Thomas model.
562: Thus the point pattern from a modified Thomas model is more clustered
563: if $\lambda_p$ or $\sigma$ is smaller. The quantity $\sigma$ also
564: controls the range of the correlation, with the range larger for
565: larger values of $\sigma$. We used several
566: values for $\lambda_p, m$ and $\sigma$ in our simulation study.
567:
568: For each point process model, we generated 500 realizations on the
569: unit square. For each
570: realization, we estimated $\xi(r)$ for $r=0.01, \ldots ,
571: 0.1$. Bootstrap estimates were then produced from each realization
572: and a nominal 95\% confidence interval constructed. Thus for each
573: point process model, we have 500 95\% confidence intervals. We then
574: checked the the empirical coverage, i.e.\ the proportion of these that
575: contained the true value of
576: $\xi(r)$, with proportion closer to 95\% being desirable. We also
577: constructed 500 confidence intervals using the normal approximation with
578: Poisson errors. The Poisson error $e_p$ is the inverse of the pair
579: counts for an uncorrelated data set of the same size as the actual data, as
580: given by \citet{landy93}. The
581: 95\% confidence intervals for $\xi$ based on the normal approximation
582: are thus given by $(\hat{\xi}\pm
583: 2e_p$). We then found the empirical coverage of these confidence
584: intervals. We then repeated the procedure for the $2\times 2$ and
585: $4\times 4$ squares. The results are summarized in Figures
586: \ref{fig:HamPoiCoverage} to \ref{fig:Booterrors}.
587:
588: Figure \ref{fig:HamPoiCoverage} shows plots of the empirical coverage
589: of nominal 95\% confidence intervals of the two-point correlation
590: function for the Poisson process model, using the \citet{hamilton93}
591: estimator. Simulation results for the other estimators are similar and are not
592: shown. The thick solid lines in the plots show the empirical coverage
593: of confidence intervals obtained using normal approximation with
594: Poisson errors. Note that Poisson errors are correct in this case and
595: we find that the empirical coverage is close to 95\% for all the point
596: densities and observation region sizes considered.
597:
598: The thin lines represent the empirical coverage of confidence
599: intervals obtained from the marked point bootstrap, with the different
600: line types representing different resampling block sizes. These were
601: squares of
602: lengths 0.5, 0.33 and 0.25 for the 1 by 1 regions, of lengths 1,
603: 0.67, 0.5, 0.33 for the 2 by 2 regions and of lengths 2, 1, 0.67,
604: 0.5 for the 4 by 4 regions (solid, dashed, dotted and dashed-dotted
605: lines respectively for increasingly smaller blocks). The difference
606: due to the block size used for resampling appear to be small. As
607: mentioned, this was an advantage of the marked point
608: bootstrap. \citet{loh02a} found greater variation of performance with
609: block size for subsampling and block bootstrap.
610:
611: Compared with the Poisson empirical coverage, we find that at low
612: densities and smaller observation region sizes (plots towards the
613: upper left of Figure \ref{fig:HamPoiCoverage}), the bootstrap method
614: does poorly. However, the empirical coverage of the bootstrap
615: confidence intervals quickly increases towards 95\% with increasing
616: density (down the columns in Figure \ref{fig:HamPoiCoverage}) and/or
617: observation region size (across the rows in Figure
618: \ref{fig:HamPoiCoverage}), i.e. with larger sample sizes.
619:
620: \clearpage
621: \begin{figure}
622: \begin{center}
623: \plotone{f1.eps}
624: \caption{Plots of the empirical coverage of nominal 95\% confidence
625: intervals of the two-point correlation function for the Poisson
626: point process model. The estimator used is that of
627: \citet{hamilton93}. Confidence intervals are obtained using normal
628: approximation with Poisson errors (thick solid line) and with the
629: marked point bootstrap using different resampling block sizes.
630: The block sizes were squares of
631: lengths 0.5, 0.33 and 0.25 for the 1 by 1 regions, of lengths 1,
632: 0.67, 0.5, 0.33 for the 2 by 2 regions and of lengths 2, 1, 0.67,
633: 0.5 for the 4 by 4 regions (solid, dashed, dotted and dashed-dotted
634: lines respectively for increasingly smaller blocks).
635: }
636: \label{fig:HamPoiCoverage}
637: \end{center}
638: \end{figure}
639: \clearpage
640:
641: \clearpage
642: \begin{figure}
643: \begin{center}
644: \plotone{f2.eps}
645: \caption{Plots of the empirical coverage of nominal 95\% confidence
646: intervals of the two-point correlation function for the modified
647: Thomas process model for realizations in a $2\times 2$ square. The
648: estimator used is that of
649: \citet{hamilton93}. Confidence intervals are obtained using normal
650: approximation with Poisson errors (thick solid line) and with the
651: marked point bootstrap using different resampling block sizes (see text).}
652: \label{fig:HamTomCoverage}
653: \end{center}
654: \end{figure}
655: \clearpage
656:
657:
658:
659: \clearpage
660: \begin{figure}
661: \begin{center}
662: \plotone{f3.eps}
663: \caption{Plots showing the true (solid), Poisson (dotted) and
664: bootstrap (dashed) errors in estimates of $\xi$ for 500 sets
665: of data simulated in a 2 by 2 square region using each of various
666: point models. The true errors are obtained from the variability in
667: the estimates of $\xi$ over the 500 data sets. For each data set,
668: Poisson and bootstrap errors are computed. The errors shown in the
669: plots are the average over the 500 data sets.}
670: \label{fig:Booterrors}
671: \end{center}
672: \end{figure}
673: \clearpage
674:
675:
676:
677: \clearpage
678: \begin{figure}
679: \begin{center}
680: \caption{Plots showing sample realizations of the modified Thomas
681: model corresponding to four different sets of parameter values,
682: simulated on a $20\times 20$ square. The degree of clustering is
683: higher in the top row, while the range of
684: clustering is larger in the right column.}
685: \label{fig:r20Thomas}
686: \end{center}
687: \end{figure}
688:
689: \begin{figure}
690: \begin{center}
691: \plotone{f5.eps}
692: \caption{Plots of the coverage (left) and errors (right) for the
693: bootstrap (solid) and Poisson (dashed)
694: methods, based on 100
695: simulated realizations from the Thomas model on the
696: $20\times 20$ square. The thick solid lines in the plots on the
697: right column represent the true errors.}
698: \label{fig:r20simulation}
699: \end{center}
700: \end{figure}
701: \clearpage
702:
703: The top left plot of Figure \ref{fig:Booterrors} shows the Poisson
704: errors and bootstrap errors for the Poisson point process model
705: simulated on the $2\times 2$ square. The bootstrap errors shown in
706: this figure are from resampling with $0.33 \times 0.33$ squares. Also
707: shown in the plot are the true errors
708: as obtained from the estimates from 500 realizations. Notice that both
709: the Poisson and bootstrap errors are close to the true errors.
710:
711:
712: Figure \ref{fig:HamTomCoverage} shows similar plots for various
713: modified Thomas models, each with number density 500.
714: The general behavior with increasing
715: observation region size for the Poisson model occurs here as
716: well. Thus to reduce the number of plots, we only include plots for the 2
717: by 2 observation regions, and show the relative performance of the
718: Poisson and bootstrap confidence intervals.
719:
720:
721:
722: We find that when the point pattern is only weakly clustered (left
723: plot, for the case $\lambda_p=100, m=2.5$ and $\sigma=0.15$),
724: the Poisson confidence intervals had empirical coverage close to the
725: nominal 95\% level. However, as the other two plots in Figure
726: \ref{fig:HamTomCoverage} show, the empirical coverages of the Poisson
727: confidence intervals become lower than 95\% as the degree and/or range
728: of clustering increases (i.e.\ with smaller $\sigma$ or $\lambda_p$).
729: On the other hand, the boostrap confidence intervals attain coverage
730: much closer to 95\% for all the cases shown, regardless of the degree
731: of clustering. Plots of the Poisson and bootstrap error estimates are
732: shown in Figure \ref{fig:Booterrors}. Notice that the Poisson
733: approximation underestimates the true error as the degree of
734: clustering increases, while the bootstrap error estimates remain close
735: to the true errors, even for the modified Thomas model.
736:
737:
738:
739: Thus we find that the performance of the Poisson confidence intervals
740: is sensitive to the degree of clustering of the point pattern. If the point
741: pattern is Poisson, or weakly clustered, the empirical coverage of
742: Poisson confidence intervals is
743: close to the nominal level, even with small sample sizes. However,
744: performance quickly deteriorates with greater degree of clustering.
745: On the other hand, the bootstrap confidence interval does not perform
746: well with small sample sizes. With moderate sample
747: sizes, however, the bootstrap method performs rather well, over a wide
748: range in the degree of clustering.
749:
750: We performed an additional set of simulations using data sets of
751: roughly 50,000 points in a $20\times 20$
752: square and estimating $\xi(r)$ for $r= 0.01$ to 2. Other than the
753: restriction to 2D, the data size
754: and range of $r$ is roughly of the scale found in current astronomy
755: data. We used the modified Thomas model and chose four sets of
756: parameter values, varying the degree and range of clustering but
757: with the same number density. A sample
758: realization from each of the four models is shown in Figure
759: \ref{fig:r20Thomas}. The models corresponding to the top row in Figure
760: \ref{fig:r20Thomas} have higher clustering than the models on the
761: bottom row. For models in the same row, the strength of clustering is
762: similar, but the model on the right has a longer correlation range.
763: We used square resampling blocks of side length 5, 2.5 and 2 and
764: results were very similar.
765:
766: The results are summarized in Figure \ref{fig:r20simulation}, which
767: show the empirical coverage of confidence intervals (left column) and
768: errors (right column) obtained from the marked point bootstrap and
769: with Poisson errors, for each of the four Thomas models.
770: The plots qualitatively show the same relative
771: performance between Poisson errors and bootstrap as found in the
772: earlier simulation study. When the range of
773: clustering is large, the empirical coverage of confidence intervals
774: based on Poisson errors and the normal approximation is very low
775: (second and fourth plots on the left column of Figure
776: \ref{fig:r20simulation}). The
777: coverage of the bootstrap intervals are affected too, but by much less.
778: When the correlation is large, the Poisson errors substantially
779: under-estimate the true errors, while the marked bootstrap errors were
780: more realistic. At the larger values of $r$, especially when $\xi$ is
781: near 0, the bootstrap procedure appears to be somewhat conservative,
782: while the Poisson errors become more accurate.
783:
784: We made some time
785: measurements of various sections of the algorithm: the functions
786: computing $DD$ and $DR$ took 1 minute and 13 minutes
787: respectively. Here, $N_R = 200,000$ and we did not use any
788: sophisticated methods (such as tree-based algorithms) to speed up the
789: computation. The bootstrap function, generating 999 samples and
790: computing the estimates, took roughly 1 minute, showing the
791: feasibility of the procedure for large data sets. The speed of the
792: marked point bootstrap is due to the fact that the marks that are
793: resampled have already been computed as part of the estimation. The
794: additional computational burden of the bootstrap is sampling the
795: points and keeping track of the number of times each point is
796: resampled.
797:
798:
799: \section{Discussion}
800: \label{sect:discussion}
801:
802: In this paper, we introduced the marked point bootstrap as a
803: method to bootstrap spatial data for estimating errors without
804: specific model assumptions. In particular,
805: we described how the method can be applied to estimators of the two-
806: and three-point correlation functions.
807: With the non-parametric bootstrap, errors are obtained from the actual
808: data. There is no need choose a model, select parameter values or
809: generate mock catalogs using $N$-body simulations. Thus errors
810: obtained from non-parametric bootstrap can be used to compare with
811: errors obtained from other methods with more specific
812: model assumptions.
813:
814: For non-parametric spatial bootstrap, we propose the marked point bootstrap
815: over the more common block bootstrap or subsampling methods. There are
816: several advantages of the marked point bootstrap.
817: Firstly, by using information from actual pairs
818: or triplets of points in the data, bootstrap confidence intervals
819: using the marked point bootstrap attain better empirical coverage than
820: confidence intervals constructed using block bootstrap (see
821: \citealt{loh02a} for a comparison of these two methods).
822:
823: Secondly, in
824: the marked point bootstrap, it is the marks that are used to compute
825: the bootstrap estimate. Thus the resampled points do not have to be
826: arranged to form a new point pattern. This makes it a lot easier to
827: bootstrap data that are observed in irregularly shaped regions that are
828: common in astronomy. In \citet{loh02a} for example, bootstrap on an
829: absorber catalog was done using slices as well as spheres, with
830: similar results for both types of resampling regions.
831:
832: Thirdly, the marks used for resampling are part of the original
833: estimate and are computed during the estimation
834: step. The only additional
835: computation required by the marked point bootstrap involves selecting
836: points (that is, testing whether each point lies in a resampling region
837: or not), and keeping track of the number of times each point is
838: resampled. Unlike the block bootstrap, there is no need to
839: re-compute from scratch the estimates for each bootstrap sample.
840: This difference in computation is even greater for higher-order
841: statistics. These properties of the marked point bootstrap make it a
842: computationally feasible tool for analysis.
843:
844: Our study here suggests that
845: non-parametric bootstrap can yield valid estimates of errors under a
846: wide range of point patterns. The lack of specific model assumptions
847: means that the non-parametric bootstrap method, and in particular the
848: marked point bootstrap, can serve as an alternate and complementary method for
849: quantifying errors. Having estimates of errors obtained using Poisson
850: approximations, parametric and non-parametric bootstrap allows one to
851: have a better sense of the size of errors involved in an analysis.
852:
853: The simulation study performed here shows that bootstrap confidence
854: intervals do attain coverage close
855: to the nominal level, even for the clustered point patterns
856: where Poisson errors are known to be inaccurate, when sample sizes are
857: large. More specifically, bootstrap performance improves with
858: increasing number density, and also with increasing observation region
859: size relative to the correlation length.
860: Unfortunately, in astronomy, the correlation length may be of the same
861: scale as the observation region. If the values of $r$ at which the
862: correlation function estimates are computed are small relative to the
863: resampling blocks (and the observation region), then although the
864: bootstrap procedure would distort the dependence structure at the
865: large scales, it would still be valid for these smaller values of $r$.
866:
867: If, instead,
868: $\xi(r)$, say, for $r$ close to the size of the observation region
869: is of interest, then the bootstrap procedure would start to
870: break down, in the sense that the empirical coverage of confidence may
871: not be close to the nominal level, and the bootstrap errors not
872: reflect the true errors. In this case, the amount of information
873: contained in the data is smaller and the boundary effects are
874: magnified. With respect to the marked point bootstrap, larger blocks
875: are needed to capture the dependence structure at this larger
876: scale. For a fixed sample,
877: this cannot be done without reducing the variability of the bootstrap samples.
878: The method that might work best is parametric bootstrap, assuming that
879: the model is correct, and that the parameter values used are close to
880: the true values.
881: Non-parametric bootstrap can still be useful here. Firstly, it is at
882: least a better choice than Poisson errors, since the latter would
883: grossly underestimate the true errors. Secondly, it can provide additional
884: error estimates to compare with the errors obtained with the assumed
885: model. For these most challenging instances, having a variety of
886: methods can only be beneficial.
887:
888:
889:
890:
891:
892: \acknowledgments
893:
894: This research is supported in part by
895: National Science Foundation award AST-0507687.
896:
897:
898: \begin{thebibliography}{30}
899: \expandafter\ifx\csname natexlab\endcsname\relax\def\natexlab#1{#1}\fi
900:
901: \bibitem[{B{\"u}hlmann \& K{\"u}nsch(1999)}]{buhlmann99}
902: B{\"u}hlmann, P., \& K{\"u}nsch, H.~R. 1999, Computational Statistics and Data
903: Analysis, 31, 295
904:
905: \bibitem[{Davis \& Peebles(1983)}]{davis83}
906: Davis, M., \& Peebles, P. J.~E. 1983, Astrophysical Journal, 267, 465
907:
908: \bibitem[{Davison \& Hinkley(1997)}]{davison97}
909: Davison, A.~C., \& Hinkley, D.~V. 1997, Bootstrap Methods and their
910: Applications (Cambridge: Cambridge University Press)
911:
912: \bibitem[{Diggle(2003)}]{diggle03}
913: Diggle, P.~J. 2003, Statistical Analysis of Spatial Point Patterns, 2nd edn.
914: (London: Arnold)
915:
916: \bibitem[{Efron \& Tibshirani(1994)}]{efron94}
917: Efron, B., \& Tibshirani, R. 1994, An Introduction to the Bootstrap (New York:
918: Chapman and Hall/CRC)
919:
920: \bibitem[{Eisenstein {et~al.}(2005)Eisenstein, Zehavi, Hogg, \&
921: Scoccimarro}]{eisenstein05}
922: Eisenstein, D.~J., Zehavi, I., Hogg, D.~W., \& Scoccimarro, R. 2005,
923: Astrophysical Journal, 633, 560
924:
925: \bibitem[{Hall(1985)}]{hall85}
926: Hall, P. 1985, Stochastic Processes and their Applications, 20, 231
927:
928: \bibitem[{Hamilton(1993)}]{hamilton93}
929: Hamilton, A. J.~S. 1993, Astrophysical Journal, 417, 19
930:
931: \bibitem[{Hewett(1982)}]{hewett82}
932: Hewett, P.~C. 1982, Astronomical Journal, 201, 867
933:
934: \bibitem[{Jing \& B\"{o}rner(1998)}]{jing98}
935: Jing, Y.~P., \& B\"{o}rner, G. 1998, Astrophysical Journal, 503, 37
936:
937: \bibitem[{Kemball \& Martinsek(2005)}]{kemball05}
938: Kemball, A., \& Martinsek, A. 2005, Astronomical Journal, 129, 1760
939:
940: \bibitem[{Kerscher {et~al.}(2000)Kerscher, Szapudi, \& Szalay}]{kerscher2000}
941: Kerscher, M., Szapudi, I., \& Szalay, A.~S. 2000, Astrophysical Journal
942: Letters, 535, 13
943:
944: \bibitem[{Kulkarni {et~al.}(2007)Kulkarni, Nichol, Sheth, Seo, Eisenstein, \&
945: Gray}]{kulkarni07}
946: Kulkarni, G.~V., Nichol, R.~C., Sheth, R.~K., Seo, H.-J., Eisenstein, D.~J., \&
947: Gray, A. 2007, Monthly Notices of the Royal Astronomical Society, 378, 1196
948:
949: \bibitem[{K{\"u}nsch(1989)}]{kunsch89}
950: K{\"u}nsch, H.~R. 1989, Annals of Statistics, 17, 1217
951:
952: \bibitem[{Lahiri(2003)}]{lahiri03}
953: Lahiri, S.~N. 2003, Resampling Methods for Dependent Data (New York: Springer)
954:
955: \bibitem[{Landy \& Szalay(1993)}]{landy93}
956: Landy, S.~L., \& Szalay, A.~S. 1993, Astrophysical Journal, 412, 64
957:
958: \bibitem[{Liu \& Singh(1992)}]{liu92}
959: Liu, R.~Y., \& Singh, K. 1992, in Exploring the Limits of Bootstrap, ed.
960: R.~LePage \& L.~Billard (New York: Wiley), 225--248
961:
962: \bibitem[{Loh \& Stein(2004)}]{loh02a}
963: Loh, J.~M., \& Stein, M.~L. 2004, Statistica Sinica, 14, 69
964:
965: \bibitem[{{McCracken} {et~al.}(2007){McCracken}, Peacock, Guzzo, Capak,
966: Porciani, Scoville, Aussel, Finoguenov, James, Kitzbichler, Koekemoer,
967: Leauthaud, {Le F\`{e}vre}, Massey, Mellier, Mobasher, Norberg, Rhodes,
968: Sanders, Sasaki, Taniguchi, Thompson, White, \& {El-Zant}}]{mccracken07}
969: {McCracken}, H.~J., Peacock, J.~A., Guzzo, L., Capak, P., Porciani, C.,
970: Scoville, N., Aussel, H., Finoguenov, A., James, J.~B., Kitzbichler, M.~G.,
971: Koekemoer, A., Leauthaud, A., {Le F\`{e}vre}, O., Massey, R., Mellier, Y.,
972: Mobasher, B., Norberg, P., Rhodes, J., Sanders, D.~B., Sasaki, S.~S.,
973: Taniguchi, Y., Thompson, D.~J., White, S. D.~M., \& {El-Zant}, A. 2007,
974: Astrophysical Journal Supplement, 172, 314
975:
976: \bibitem[{Neyman \& Scott(1952)}]{neyman52}
977: Neyman, J., \& Scott, E.~L. 1952, Astrophysical Journal, 116, 144
978:
979: \bibitem[{Peebles \& Groth(1975)}]{peebles75}
980: Peebles, P. J.~E., \& Groth, E.~J. 1975, Astrophysical Journal, 196, 1
981:
982: \bibitem[{Politis \& Romano(1993)}]{politis93a}
983: Politis, D.~N., \& Romano, J.~P. 1993, Journal of Multivariate Analysis, 47,
984: 301
985:
986: \bibitem[{Politis {et~al.}(1999)Politis, Romano, \& Wolf}]{politis99}
987: Politis, D.~N., Romano, J.~P., \& Wolf, M. 1999, Subsampling (Berlin: Springer)
988:
989: \bibitem[{Ripley(1988)}]{ripley88}
990: Ripley, B.~D. 1988, Statistical Inference for Spatial Processes (New York:
991: Wiley)
992:
993: \bibitem[{Shen {et~al.}(2007)Shen, Strauss, Oguri, \& et~al.}]{shen07}
994: Shen, Y., Strauss, M.~A., Oguri, M., \& et~al. 2007, Astronomical Journal, 133,
995: 2222
996:
997: \bibitem[{Simpson \& Mayer-Hasselwander(1986)}]{simpson86}
998: Simpson, G., \& Mayer-Hasselwander, H. 1986, Astronomy and Astrophysics, 162,
999: 340
1000:
1001: \bibitem[{Snethlage(1999)}]{snethlage99}
1002: Snethlage, M. 1999, Metrika, 49, 245
1003:
1004: \bibitem[{Stoyan {et~al.}(1995)Stoyan, Kendall, \& Mecke}]{stoyan95}
1005: Stoyan, D., Kendall, W.~S., \& Mecke, J. 1995, Stochastic Geometry and Its
1006: Applications, \textnormal{2nd edition} (New York: John Wiley)
1007:
1008: \bibitem[{Szapudi \& Szalay(1998)}]{szapudi98}
1009: Szapudi, I., \& Szalay, A.~S. 1998, Astrophysical Journal, 494, L41
1010:
1011: \bibitem[{Waagepetersen(2007)}]{waag07}
1012: Waagepetersen, R. 2007, Biometrics, 63, 252
1013:
1014: \end{thebibliography}
1015:
1016:
1017:
1018: \end{document}
1019:
1020: