0706:0706.1762/PLC.tex

1: \documentclass{article}

2: \usepackage{epsf,graphicx}

3: % --- page style definitions ---

4:

5: \setlength{\textwidth}{17cm}

6: \setlength{\textheight}{20cm}

7: \setlength{\oddsidemargin}{0pt}

8: \setlength{\evensidemargin}{0pt}

9: \setlength{\topmargin}{0pt}

10:

11: \newlength{\refindent}

12: \setlength{\refindent}{\parindent}

13: \newlength{\parskiplen}

14: \setlength{\parskiplen}{2.5mm}

15:

16: \def\btheta{\mbox{\boldmath$\theta$}}

17:

18: \begin{document}

19:

20: % --- reference and figure caption environments ---

21:

22: \newenvironment{references}{\clearpage

23: 			    \section*{\large \bf REFERENCES}

24: 			    \parindent=0mm \everypar{\hangindent=3pc

25: 			    \hangafter=1}}{\parindent=\refindent \clearpage}

26: \newenvironment{figcaps}{\clearpage

27: 			 \section*{\large  \bf FIGURE CAPTIONS}}{}

28: \newcommand{\fig}[2]{\parbox[t]{2.0cm}{Figure #1:} \

29: 		   \parbox[t]{13.5cm}{#2}\\[\baselinestretch\parskiplen]}

30:

31: % --- title ---

32:

33: \begin{titlepage}

34: \begin{center}

35: \vspace*{0.5cm}

36: {\huge The Detailed Forms of the LMC Cepheid}\\[0.5cm]

37: {\huge PL and PLC Relations}\\[3.0cm]

38:

39: {\large C. Koen$^1$, S. Kanbur$^2$ and C. Ngeow$^3$}\\[1cm]

40: \normalsize

41: {\em  1 Department of Statistics, University of the Western Cape,

42: Private Bag X17, Bellville, 7535 Cape, South Africa}\\[0.6cm]

43: {\em  2 Department of Physics, State University of New York at Oswego,

44: Oswego, NY 13126, USA}\\[0.6cm]

45: {\em  3 Department of Astronomy, University of Illinois,

46: Urbana-Champaign, IL 61801, USA}\\[2.0cm]

47:

48: \end{center}

49:

50:

51:

52: %\setlength{\baselineskip}{0.8cm}

53: \begin{quotation}\noindent{\bf ABSTRACT.}

54: Possible deviations from linearity of the LMC Cepheid PL and PLC relations are

55: investigated. Two datasets are studied, respectively from the OGLE and MACHO

56: projects. A nonparametric test, based on linear regression residuals,

57: suggests that neither PL relation is linear. If colour dependence is allowed for

58: then the MACHO PL relation is found to deviate more significantly from the linear,

59: while the OGLE PL relation is consistent with linearity. These finding are confirmed

60: by fitting ``Generalised Additive Models" (nonparametric regression functions)

61: to the two datasets. Colour dependence is shown to be nonlinear in both datasets,

62: distinctly so in the case of the MACHO Cepheids. It is also shown that there is

63: interaction between the period and colour functions in the MACHO data.

64:

65: \vspace*{1.0cm}

66: {\bf Key words:} methods: statistical - stars: variables: Cepheids -

67: cosmology: distance scale

68: \end{quotation}

69:

70:

71:

72: \end{titlepage}

73:

74: \section{INTRODUCTION}

75:

76: Cepheids are important objects in Astrophysics, both because of their use in the

77: extra-galactic distance

78: scale and their role in stellar evolution. Their regularly repeating

79: light curves offer an important opportunity

80: to test theories of stellar evolution against stellar pulsation: mass-luminosity

81: (ML) relations mandated from evolutionary calculations

82: can be used as input to full linear and non-linear hydrodynamic models of

83: Cepheids and compared to observations. These ML relations contain

84: input about evolutionary physics such as the amount of convective overshoot.

85: Constraining theoretical models with observations can be used to gain

86: considerable insight into

87: evolutionary/pulsation physics. On the other hand the Cepheid period-luminosity

88: (PL) relation has played an important role in establishing the

89: extra-galactic distance scale and the subsequent estimation of Hubble's constant,

90: $H_0$. The $HST$ Key Project (Freedman et al. 2001) has used $HST$ observations of

91: Cepheids in a number of galaxies to estimate $H_0$ to within $10\%$ accuracy.

92: The crucial step in this work has been the Cepheid PL relation in the

93: Large Magellanic Cloud (LMC) which has been used to characterize a Cepheid PL

94: relation template. This PL template has

95: traditionally been thought to be linear, however there has also been recent work

96: implying a variation of the slope with period in the LMC (Tammann \& Reindl 2002;

97: Kanbur \& Ngeow 2004, 2006; Sandage et al. 2004; Kanbur et al. 2007a;

98: Ngeow et al. 2005; Ngeow \& Kanbur 2006a,b).

99:

100: Ngeow and Kanbur (2006c) estimate the error in estimating $H_0$, if a linear

101: Cepheid PL relation is assumed

102: and the underlying relation is "non-linear"

103: at a period of 10 days, and find this can lead to an error of about $1-2\%$.

104: Such an error seems small but with significant

105: work being carried out to reduce zero point errors (Macri et al 2006), it is

106: important to construct as accurate a distance scale as possible that is independent of

107: the CMB. Further, table 2 of Spergel et al (2007) points to the fact that an

108: independent estimate of $H_0$, accurate to less than

109: $5\%$, will help to break the degeneracy between ${\Omega}_{matter}$ and $H_0$

110: present from WMAP CMB studies. An independent estimate

111: of $H_0$ accurate to $1\%$ will result in a reduction of the $65\%$ confidence

112: interval on ${\Omega}_{matter}$ by almost a factor of two over

113: that with WMAP data alone.

114:

115: In previous studies, a rigorous statistical test, the $F$ test, was

116: applied to the LMC Cepheids to test for the linear versus non-linear

117: PL relation. Here by ``non-linear'' we mean two lines of significantly

118: differing slope which are continuous at a period of 10 days. The $F$ test

119:  results that were obtained from the OGLE (Optical Gravitational Lensing Experiment,

120: Udalski et al. 1999) and MACHO Cepheid data, in Kanbur \& Ngeow (2004; 2006)

121: and Ngeow et al. (2005) respectively, strongly imply that the LMC PC/PL

122: relations are non-linear. It is important to note that several other

123: statistical tests, such as the ${\chi}^2$ tests, least absolute deviation,

124: robust estimation and loess procedures, were also applied to the MACHO data,

125: and these results also point to a non-linear LMC PL relation

126: (Ngeow et al. 2005). Recently, Kanbur et al (2007a) developed the use of

127: testimators and a likelihood based method using the Schwarz Information

128: Criterion, to study non-linearities in the LMC PL relation (using both

129: OGLE and MACHO Cepheid data) and again came to the same conclusion: the

130: LMC Cepheid PL relation is non-linear in the sense described above. The

131: $F$ test also suggested that the LMC period-colour (PC) relation is

132: non-linear, in contrast to the Galactic and SMC (Small Magellanic Cloud)

133: PC relations (Kanbur \& Ngeow 2004). Since the question of the non-linearity

134: of the LMC PL relation is important in distance scale and stellar studies,

135: it is vital to establish this as firmly as possible; this is one of

136: the motivations for this paper.

137:

138: In addition to investigating the non-linearity of the LMC PL relation, we

139: also study the LMC period-luminosity-colour (PLC) relation.

140: A number of authors, including Sandage (1958) and Madore and Freedman (1991)

141: have derived the Period-Luminosity-Color (PLC) relation and shown how it arises from

142: the period-mean density theorem, the Stefan-Boltzmann law and the existence of

143: an instability strip. These authors also point out that the PL/PC relations are

144: obtained from the PLC relation by averaging over the variable not included

145: in the relation.

146:

147: In Section 2, we briefly describe the

148: data used in our study. In Section 3 we apply a preliminary test study

149: on the LMC PL relation. This is followed by more detailed analysis in Section 4,

150: based on a non-parametric model fitting procedure.

151: An extension to the PLC relation is presented in Section 5.

152: The conclusion and discussion of our results are given in Section 6.

153:

154: We add a few sentences on the use of non-parametric

155: methods in what follows. The term ``non-parametric" is actually used in three slightly

156: different senses. First, the major innovation (sections 4 and 5) in this paper is the

157: use of ``non-parametric regression". The meaning is {\it not} necessarily the usual

158: one of ``distribution-free": rather, it means that the form of the regression is not

159: specified -- the regression function is ``unstructured", being dictated by the data

160: itself. Of course, this flexibility allows one to detect subtleties which may

161: otherwise be overlooked. Second, in the next section of the paper we use a well-known

162: distribution-free statistic, the ``Wald-Wolfowitz runs test". This non-parametric

163: statistic uses only data ranks, and hence typically not very powerful. Third, also

164: in the next section use is made of a permutation method. This avoids

165: distributional assumptions about the data, by

166: using re-orderings of the data itself to establish significance levels.

167:

168: \section{THE DATA}

169:

170: We use two sets of LMC Cepheid data in our study. The first data set is the

171: extinction corrected $V$-band mean magnitudes and $(V-I)$ colours for the OGLE

172: LMC Cepheids taken from Kanbur \& Ngeow (2006), supplemented with additional

173: Cepheids from Sebo et al (2002), and referred as ``OGLE'' data in this paper.

174: The second data set is the MACHO Cepheids data, with extinction corrected $V$

175: mean magnitudes and $(V-R)$ colours, adopted from Ngeow et al (2005).

176: Using these two data sets allow us to compare the results, particularly for the

177: different photometric filters used.

178:

179: A possible complication is that any apparent non-linearity in PL or PLC

180: relations could be caused by

181: extinction errors which are a function of colour or period.

182: Arguments against extinction errors as a cause of observed non-linear LMC PL

183: and PC relations were presented in Kanbur \& Ngeow (2004), Kanbur \& Ngeow (2006),

184: Kanbur et al. (2007b), Ngeow et al. (2005), Ngeow \& Kanbur (2006b) and

185: Sandage et al. (2004), and will therfore not be repeated in detail here.

186: In particular, a possible period dependency of extinction errors has been

187: investigated in Ngeow \& Kanbur (2006b). If such extinction errors were present,

188: then the PC relations at maximum light would be such that LMC Cepheids would get

189: hotter at maximum light as the pulsation period increases: a fact which would

190: be hard to reconcile with pulsation theory especially as Galactic Cepheids,

191: in common with LMC Cepheids, display a flat PC relation at

192: maximum light (Kanbur \& Ngeow 2004, 2006). Further, the dependence of extinction

193: error on colour would need to

194: be very complicated to explain both the non-linearity at mean light whilst

195: preserving the flatness at maximum light.

196:

197: It is also noted that the reddening values adopted here are the {\it same}

198: as those used in many distance scale studies (Freedman et al. 2001).

199:

200: \section{A PRELIMINARY INVESTIGATION BASED ON A TEST PROCEDURE}

201:

202: Figs. 1 and 2 show the MACHO and OGLE PL data, with least squares

203: linear fits of the form

204: \begin{equation}

205: V=a+b \log P +{\rm error} \; .

206: \end{equation}

207: For the sake of completeness,

208: \begin{eqnarray}

209: V&=&17.08(0.026)-2.70(0.039) \log P \;\;\;\;\;(MACHO)\nonumber\\

210: V&=&17.05(0.020)-2.69(0.028) \log P \;\;\;\;\;\;(OGLE)

211: \end{eqnarray}

212: where standard errors of coefficient estimates are given in brackets.

213: Although both fits are excellent, it is nonetheless

214: of some interest whether there may be subtle deviations from the strictly

215: linear relations between $V$ and $\log P$ shown by the lines:

216: although this may have little importance for prediction of luminosity

217: given the period, it could (e.g.) have an important bearing on the modelling

218: of Cepheid pulsations.

219:

220: A simple procedure which provides some insight into the problem is

221: to study partial sums of the residuals of the least squares fits.

222: First arrange the data so that the period values are in ascending order:

223: $$P_1 <P_2<P_3<\ldots <P_N $$

224: where $N$ is the sample size. Then

225: \begin{equation}

226: C(j)=\sum_{k=1}^j [V_k-a-b \log P_k]=\sum_{k=1}^j r_k

227: \end{equation}

228: are the partial sums of the residuals $r_k$. If there are no deviations

229: from linearity, then $C(j)$ is the sum of uncorrelated random numbers and

230: hence a simple random walk. However, if there are deviations

231: from linearity successive residuals may be correlated, and hence $C(j)$

232: will not be a simple random walk. Partials sums of the $r_k$ can be

233: seen in Figs. 3 and 4.

234:

235: A statistic which can be used for testing whether the partial sum

236: is a pure random walk is its vertical range

237: $$R=\max_j C(j)-\min_j C(j) \; :$$

238: this may be expected to be inflated by positively correlated residuals. Significance

239: levels for the values of $R$ are readily obtained by permutation, as

240: follows:

241: \begin{itemize}

242: \item[(i)]

243: Permute the $r_k$; this will randomise the residuals by destroying any

244: possible trends.

245: \item[(ii)]

246: The partial sums of the permuted $r_k$ will be true random walks --

247: find the statistic $R$ for the permutation.

248: \item[(iii)]

249: Repeat steps (i) and (ii) a large number of times, noting the values of

250: $R$.

251: \item[(iv)]

252: Determine the fraction of permutation $R$-values which exceeds the observed

253: value -- this estimates the significance level of the observed $R$.

254: \end{itemize}

255: Applying 10000 permutations, significance levels of 3\% and 4\% were

256: obtained for the MACHO and OGLE data respectively, suggesting

257: meaningful deviation of the observed $r_k$ from randomness. The implication

258: is therefore that the PL relation is not perfectly linear.

259:

260: Study of Figs. 3 and 4 shows that there is an excess of positive residuals

261: for $\log P \sim 0.5$ and $\log P>1$, and an excess of negative values

262: for $0.8<\log P<1$.

263:

264: Interestingly, application of the standard Wald-Wolfowitz runs test

265: (e.g. Conover 1971) for randomness of the residuals gives conflicting results

266: for the two datasets -- significance levels of 45\% and 0.9\% for

267: the OGLE and MACHO data respectively. Of course, the procedure

268: uses only the signs, and not the sizes, of the $r_k$.

269:

270: It is known that Cepheids follow a PLC, rather than simply a PL,

271: relation. It may therefore be prudent to replace (1) by

272: \begin{equation}

273: C(j)=\sum_{k=1}^j [V_k-a-b \log P_k-c(CI)_k]

274: \end{equation}

275: where $(CI)$ indicates a colour index, with regression coefficient

276: $c$. This has a substantial influence on the significance levels

277: of the statistic $R$: for the OGLE data is increases to 33\%, while

278: the level for the MACHO data is reduced to 0.7\%. The corresponding

279: Wald-Wolfowitz test levels are 43\% and 1.5\%.

280:

281: To summarise, there is strong evidence of non-randomness in the residuals

282: of the MACHO data, both for the PL and the PLC relations. For the OGLE

283: data the results are ambiguous.

284:

285:

286: \section{PL RELATION}

287:

288: An alternative to the imposition of a fully specified parametric

289: model such as (1) is to allow the form of the regression to be

290: dictated by the data. The idea is conveniently illustrated by

291: a technique known as ``loess" (see e.g. Cleveland \& Devlin 1988).

292: Ngeow et al (2005)

293: initially used this method on MACHO data and found a similar result to

294: that reported here. Here we study it in more detail and apply it to both

295: MACHO and OGLE Cepheid data.

296: The method entails fitting a low order polynomial (in the present

297: case a straight line) over restricted sections (``windows") of the

298: data by weighted least squares. In the implementation here the only

299: free parameter is the width $\alpha$ of the window, which is usually

300: given as a fraction of the range of the independent variable (i.e.

301: $0<\alpha \le 1$) . The smaller $\alpha$

302: the more ``local" the estimated regression, and the more detail

303: it shows. Fig. 5 shows a loess regression of the OGLE data, using

304: $\alpha=0.05$; if $\alpha$ is increased towards unity the loess regression

305: resembles the linear fit of Fig. 2.

306:

307: A key element is then obviously the choice of window width $\alpha$, and

308: it is desirable to use an objective method to find it. This is readily

309: done by ``cross-validation":

310: \begin{itemize}

311: \item[(i)]

312: Choose a value of the window width $\alpha$.

313: \item[(ii)]

314: Leave out the first datapoint and obtain a loess estimate

315: $\widehat{V}_1$ of the magnitude $V_1$

316: by fitting the regression to the remaining data.

317: \item[(iii)]

318: Note the discrepancy

319: $$\Delta_1=V_1-\widehat{V}_1$$

320: between the true and predicted values.

321: \item[(iv)]

322: Repeat steps (ii)-(iii) for the second, third,..., last datapoints,

323: giving the set $\Delta_1, \Delta_2,\ldots,\Delta_N$ of discrepancies.

324: \item[(v)]

325: The value of the cross-validation criterion for the value of $\alpha$

326: from (i) is the defined as

327: \begin{equation}

328: CV(\alpha)=\frac{1}{N} \sum_{j=1}^N \Delta_j^2

329: =\frac{1}{N} \sum_{j=1}^N (V_j-\widehat{V}_j)^2

330: \end{equation}

331: Clearly, it evaluates the predictive power over all the observations

332: of the loess fit based on the particular value of $\alpha$.

333: \item[(vi)]

334: Repeat steps (i)-(v) for all candidate values of $\alpha$.

335: \item[(vii)]

336: The optimal $\alpha$ is that which minimises $CV(\alpha)$.

337: \end{itemize}

338:

339: The cross-validation functions for the two datasets are plotted in Fig. 6;

340: optimal window widths are 0.36 and 0.20

341: respectively for the MACHO and OGLE observations. In Figs. 7 and

342: 8 the resultant loess

343: functions are compared to the regression lines from (1). A small difference

344: between the curves over the approximate interval $0.8<\log P<1$ is visible

345: in both diagrams. There is also a substantial disagreement at the longest

346: periods for the MACHO results in Fig. 7: this is clearly due to the

347: {\it systematic} difference between the data and the linear regression

348: line for $\log P>1.25$ (see Fig. 1). Similarly, the slight divergence

349: between the loess and linear regression lines at the longest periods

350: in Fig. 8, can be traced to the influence of the two OGLE datapoints with

351: $\log P>1.7$ (see Fig. 2).

352:

353: The question arises as to whether the discrepancies between the loess

354: curves and the straight line fits are at all meaningful. In order

355: to address this issue confidence intervals for the loess curves are

356: estimated by bootstrapping (e.g. Efron \& Tibshirani 1993). The results,

357: based on 5000 bootstrap samples, are plotted in Figs. 9 and 10.

358: Rather than showing the linear regression line and the 95% upper and

359: lower limits, the {\it difference} between the linear fit and the

360: confidence limits are plotted, in order to more clearly display

361: the deviations. It is notable that the linear fits lie outside the

362: confidence intervals for the loess functions for $0.8<\log P<1$ roughly.

363: This supports previous work which has suggested a "break" around a

364: period $\log P \approx 1$

365: (Kanbur \& Ngeow 2004, Ngeow et al 2005, Kanbur et al 2007a).

366:

367: The {\textsc R} software add-on package ``mcgv" contains an alternative

368: nonparametric regression facility in the form of thin plate regression

369: splines (TPRS) (e.g. Wood 2006). The form of cross-validation used is based

370: on a balance between the sum of squared model residuals (which measures

371: the goodness of the model fit) and a smoothness term. Cross-validation in

372: mcgv is automated.

373:

374: The loess and TPRS results are compared for the MACHO and OGLE respectively

375: in Figs. 11 and 12. The agreement is very good -- in particular, the deviations

376: from linearity for $0.8<\log P<1$ are also evident in the TPRS results.

377: Despite the fact

378: that more effective degrees of freedom are required for the nonparametric

379: fits (6.41 and 8.71 for the TPRS fits to the MACHO and OGLE data respectively)

380: than for linear regression (3 degrees of freedom), the former fits follow

381: the data considerably more closely. Model selection tools such as the

382: ``Akaike Information Criterion" (AIC, e.g. Burnham \& Anderson 2002) can be used to test

383: whether the improved model fit warrants the additional degrees of freedom expended.

384: In this case, the TPRS fits are both preferred by very wide margins.

385:

386:

387: \section{PLC RELATION}

388:

389: Unusual datapoints can have substantial, often somewhat distorting,

390: influences on regression surfaces. It is therefore worthwhile examining

391: the datasets carefully in order to identify such data. This is most

392: easily done using ordinary multiple linear least squares regression.

393:

394: Fitting PLC relations to the two datasets give the results

395: \begin{eqnarray}

396: V&=&16.23(0.026)-3.30(0.029) \log P +3.95(0.093)(V-R)\;\;\;\;\;(MACHO)

397: \nonumber\\

398: V&=&15.97(0.025)-3.23(0.018) \log P +2.30(0.049)(V-I)\;\;\;\;\;\;(OGLE)

399: \end{eqnarray}

400: with residual standard deviations 0.164 and 0.097 mag.

401: Regression diagnostics were examined in order to identify observations

402: which gave rise to large residuals and/or were unduly influential on

403: parameter estimates. ``Cooks's $D$" statistic was used for the latter

404: purpose -- see e.g. Montgomery, Peck \& Vining (2001) (or almost any

405: other modern text devoted to linear regression theory). Three points were

406: eliminated

407: from the MACHO data, and four from the OGLE data, on the basis of these

408: diagnostics.

409: The PLC relations were then re-estimated for the reduced datasets, and the

410: new sets of diagnostics examined. This led to a further two deletions from

411: the OGLE data. The final results, replacing (6), are

412: \begin{eqnarray}

413: V&=&16.23(0.026)-3.32(0.029) \log P +4.00(0.092)(V-R)\;\;\;\;\;(MACHO)

414: \nonumber\\

415: V&=&15.89(0.021)-3.29(0.015) \log P +2.48(0.041)(V-I)\;\;\;\;\;\;(OGLE)

416: \end{eqnarray}

417: with residual standard deviations of 0.162 and 0.074 mag. The substantial

418: reduction in residual variance, and large changes in regression

419: coefficients for the OGLE results are particularly striking.

420:

421: It is interesting to examine the positions of the rejected observations in

422: three-dimensional dataplots. The plots in Figs. 11 and 12 were obtained by

423: selecting perspectives which clearly show the positions of all questionable

424: data. It is clear the observations for each dataset lie close to a plane,

425: and that points with unsatisfactory regression diagnostics (marked by

426: squares) all deviate from the plane. The fact that the plane in Fig. 12

427: (OGLE data) is so well-defined explains why removal of the outlying points

428: made such a substantial difference to the estimated coefficients. In the

429: remainder of this paper we work with the reduced datasets ($N=1213, 717$

430: for MACHO and OGLE data respectively). Note that one high-influence datum

431: in the OGLE data is retained (for the brightest Cepheid -- see Fig. 12),

432: since its associated residual is very small, and since its omission

433: has very little influence on the values of the three estimated parameters.

434:

435: An obvious extension of the linear PLC relation to the nonparametric case

436: is the so-called ``Generalised Additive Model"

437: \begin{equation}

438: V=\alpha+f_P(\log P)+f_C(CI)+{\rm error}

439: \end{equation}

440: where $\alpha$ is a constant; $CI$ denotes a colour index;

441: and $f_P$ and $f_C$ are nonparametric

442: regression functions such as loess or TPRS fits. Due to the several

443: attractive features (automated cross-validation, to mention but one)

444: the {\textsc R} add-on package is once again used to perform

445: TPRS fits of (8) to the data.

446:

447: The results can be seen in Figs. 15  and 16. The estimated $f_P$ for the

448: OGLE data is linear: the effective degrees of freedom, 1.00, confirms

449: this. By implication the model (8) reduces to

450: \begin{equation}

451: V=\alpha+\beta \log P+f_C(CI)+{\rm error} \;.

452: \end{equation}

453: Not surprisingly, the AICs of models (8) and (9)

454: are exactly equal for the OGLE data.

455:

456: The function $f_P$ for the MACHO data shows the familiar deviation from

457: linearity in the range $0.8<\log P<1$; this is more clearly demonstrated

458: in Fig. 17, where a linear fit to $f_P$ has been subtracted.

459:

460:

461: Inspection of the $f_C$ functions in Fig. 16 shows that both are

462: distinctly nonlinear.

463:

464: It is of obvious interest to investigate why $f_P$ reduces to the perfectly

465: linear form in the case of the OGLE data, when the dependence of $V$ on

466: $\log P$ in the PL relation is nonlinear. Examining the relationship between

467: $\log P$ and the colour index $(V-I)$ gives some insight into this question.

468: The results of a loess regression of $(V-I)$ on $\log P$ for the OGLE

469: data are displayed in

470: Fig. 18. The 95\% confidence intervals, obtained from 5000 bootstrap

471: samples, are also shown. Calculations were done using a smoothing

472: window of width 0.20, as indicated by cross-validation. The analogous

473: plot for the MACHO data, based on a smoothing window width of 0.33,

474: is in Fig. 19. In the case of the OGLE data there is a

475: clear change in the relationship between $\log P$ and $(V-I)$

476: in the neighbourhood $0.8<\log P<1$. It appears that small deviations from

477: linearity in the $PL$ relation in Fig. 8 are compensated by the colour dependence.

478: In the case of the MACHO data the kink in the $PC$ plot (Fig. 19) is of similar

479: size to that in Fig. 18, but the deviation from linearity in the $PL$ plot is

480: larger (Fig. 7). This may explain why the $f_P$ function remains nonlinear

481: in the case of the MACHO $PLC$ relation. These results support similar work

482: presented in Kanbur and Ngeow (2004) and Ngeow and Kanbur (2005)

483: on the non-linearity of the LMC PC relation using $F$

484: tests, and on the linearity of the LMC Wessenheit function.

485:

486: Nonparametric regression lends itself to much more flexible forms than

487: ordinary multiple regression. Two possible alternatives to (8) are

488: \begin{equation}

489: V=\alpha+f_P(\log P)+f_C(CI)+f_{PC}(\log P, CI)+{\rm error}

490: \end{equation}

491: and

492: \begin{equation}

493: V=\alpha+f_{PC}(\log P,CI)+{\rm error}

494: \end{equation}

495: which allows for interaction between the two independent variables.

496:

497: The two Generalized Additive Models (10) and (11) were

498: also fitted to both datasets.

499: For the OGLE data, the AIC-preferred model is (10), but

500: a more detailed analysis (ANOVA) shows that the contribution from

501: the interaction function $f_{PC}$ is not significant -- hence the

502: model effectively reduces to (8). For the MACHO data the pure

503: interaction model (11) is preferred, with (10) the second choice.

504: According to the AIC, the additive model (8) is a very distant

505: third choice. A contour plot of the fit of the model (11) can

506: be seen in Fig. 20 -- this demonstrates why (8) is inadequate.

507: Of course, in practice (11) would be more tedious to work with

508: than the simpler additive form (8).

509:

510: A few words of explanation of Fig. 20 may be in order. The form of

511: a purely linear PLC relation would of course be

512: $$V=a+b\log P+c CI+{\rm error} \; .$$

513: One way of displaying this graphically would be to draw the

514: lines

515: $$V={\rm constant}$$

516: in the $\log P$-$CI$ plane, for various values of the constant.

517: The equations describing these contour lines are

518: $$CI=(V-b\log P-{\rm constant})/c +{\rm error} \; ,$$

519: i.e. straight lines with slope $-b/c$. Fig. 20, the

520: equivalent for the non-parametric function $f_{PC}$, shows

521: not only that the relations are nonlinear, but also that there

522: is ``interaction" -- the form of the relation depends on the region

523: of the $\log P$--$(V-R)$ plane it inhabits.

524:

525: \section{CONCLUSIONS \& DISCUSSION}

526:

527: It should perhaps come as no surprise that with the acquisition of

528: large amounts of new data finer detail in the relationships between

529: astrophysical observables are uncovered.

530: The best-fitting models of the two datasets are

531: given by (11) (MACHO) and (9) (OGLE) respectively, which both

532: are both nonlinear.

533:

534: Estimates of the effect of such small non-linearities

535: on the Cepheid distance scale and

536: on Hubble's constant are given in Ngeow and Kanbur (2006c) and amount to $1-2\%$.

537: Such an error seems small but in the

538: era of "precision cosmology" with a drive toward a distance scale accurate to

539: $5\%$, such an effect is important. Perhaps just

540: as important, a proper characterization of the precise detail in the observed

541: phenomena will assist in placing improved constraints

542: on pulsation models of Cepheids and in particular on their ML relations, and

543: hence on details of stellar evolutionary

544: physics such as the amount of convective core overshoot.

545:

546: A possible physical explanation for this non-linearity is outlined in the

547: papers by Kanbur et al. (2004), Kanbur \& Ngeow (2006) and

548: Kanbur et al. (2007b), which studied

549: Galactic, LMC and SMC Cepheid models respectively. Briefly, these papers

550: suggest the non-linearity is caused by the interaction of the hydrogen

551: ionization front (HIF) and photosphere and the way this interaction varies with

552: period. At low densities, if the HIF and photosphere are engaged

553: (i.e. the photosphere lies at the base of the HIF) then the temperature of

554: the photosphere and hence the colour of the star are almost independent

555: of global stellar properties such as the period. Since the relative location

556: of the photosphere and HIF varies with the $L/M$ ratio, and since this

557: varies with period, modelling has implied that for LMC Cepheids with a period

558: greater than 10 days, the photosphere and HIF are not engaged. Thus these

559: stars have a different PC relation than their shorter period counterparts,

560: Because the PC and PL relations are really forms of the PLC relation, then

561: a change in the PC relation results in a change in the PL relation.

562: Galactic Cepheids are such that the HIF-photosphere interaction only really

563: occurs at maximum light at low densities. LMC Cepheids are such that this

564: HIF-photosphere interaction starts to occur at low densities only

565: for Cepheids with periods greater than 10 days. SMC Cepheids are such that

566: this HIF-photosphere interaction always occurs at high densities (Kanbur et al.

567: 2004; Kanbur \& Ngeow 2006; Kanbur et al. 2007b).

568:

569: \section*{\large \bf ACKNOWLEDGMENTS}

570: The authors are grateful for the efforts of those who have developed

571: and maintained the {\textsc R} statistical software. SMK acknowledges

572: support from a small research grant from

573: the American Astronomical Society and the Chretien International research grant.

574: CN acknowledges financial support

575: from NSF award OPP-0130612 and a University of Illinois seed funding

576: award to the Dark Energy Survey.

577:

578:

579: \begin{references}

580: Burnham K.P., Anderson D.R., 2002, Model Selection and Multimodel Inference:

581:   a Practical Information-Theoretic Approach (Second Edition). Springer, New York

582:

583: Cleveland W.S., Devlin S.J., 1988, J. Amer. Stat. Assoc., 83, 597

584:

585: Conover W.J., 1971, Practical Nonparametric Statistics. John Wiley \&

586:     Sons Inc., New York

587:

588: Efron B., Tibshirani R.J., 1993, An Introduction to the Bootstrap.

589:     Chapman \& Hall, London

590:

591: Freedman, W., et al., 2001, ApJ, 553, 47

592:

593: Kanbur, S. \& Ngeow, C., 2004, MNRAS, 350, 962

594:

595: Kanbur, S. \& Ngeow, C., Buchler R., 2004, MNRAS, 354, 212

596:

597: Kanbur, S. \& Ngeow, C., 2006, MNRAS, 369, 705

598:

599: Kanbur, S., Ngeow, C., Nanthakumar, A. \& Stevens, R., 2007a, PASP, 119, 512

600:

601: Kanbur S., Ngeow C., Feiden G., 2007b, submitted

602:

603: Macri L., Stanek K., Bersier D., Greenhill L., Reid M., 2006, ApJ, 652, 1133

604:

605: Madore B., Freeman W., 1991, PASP, 103, 933

606:

607: Montgomery D.C., Peck E.A., Vining G.G., 2001, Introduction to Linear

608:      Regression Analysis (Third Edition). John Wiley \& Sons, Inc., New

609:      York

610:

611: Ngeow, C. \& Kanbur, S., 2005, MNRAS, 360, 1033

612:

613: Ngeow, C., Kanbur, S., Nikolaev, S., Buonaccorsi, J., Cook, K. \& Welch, D.,

614:    2005, MNRAS, 363, 831

615:

616: Ngeow, C. \& Kanbur, S., 2006a, MNRAS, 369, 723

617:

618: Ngeow, C. \& Kanbur, S., 2006b, ApJ, 650, 180

619:

620: Ngeow, C. \& Kanbur, S., 2006c, ApJ, 642, L29

621:

622: Sandage, A., 1958, ApJ, 127, 513

623:

624: Sandage, A., Tammann, G. A. \& Reindl, B., 2004, A\&A, 424, 43

625:

626: Sebo, K., et al., 2002, ApJS, 142, 71

627:

628: Spergel D., et al., 2007, ApJ, in press (ArXiv:astro-ph/0603449)

629:

630: Tammann, G. A. \& Reindl, B., 2002, Astrophys. \& Space Sci., 280, 165

631:

632: Udalski, A., Soszynski, I., Szymanski, M., Kubiak, M., Pietrzynski, G.,

633:   Wozniak, P., \& Zebrun, K. 1999, Acta Astron., 49, 223

634:

635: Wood S., 2006, Generalized Additive Models. An Introduction with R.

636:   Chapman \& Hall/CRC, Boca Raton (Fl)

637: \end{references}

638:

639:

640: \pagebreak

641:

642:

643: \pagebreak

644:

645: \begin{figure}

646: \epsfysize=8.0cm

647: \epsffile{fig1.eps}

648: \caption{MACHO PL data for LMC Cepheids. The line is a linear

649: least squares fit to the data.}

650: \end{figure}

651:

652: \begin{figure}

653: \epsfysize=8.0cm

654: \epsffile{fig2.eps}

655: \caption{OGLE PL data for LMC Cepheids. The line is a linear

656: least squares fit to the data.}

657: \end{figure}

658:

659: \begin{figure}

660: \epsfysize=8.0cm

661: \epsffile{fig3.eps}

662: \caption{Partial sums of the residuals from the fit in Fig. 1.}

663: \end{figure}

664:

665: \begin{figure}

666: \epsfysize=8.0cm

667: \epsffile{fig4.eps}

668: \caption{Partial sums of the residuals from the fit in Fig. 2.}

669: \end{figure}

670:

671: \begin{figure}

672: \epsfysize=8.0cm

673: \epsffile{fig5.eps}

674: \caption{An illustrative loess regression on the OGLE PL data. The window

675: width is 0.05, i.e. 5\% of the range of $\log P$.}

676: \end{figure}

677:

678: \begin{figure}

679: \epsfysize=9.0cm

680: \epsffile{fig6.eps}

681: \caption{Cross-validation functions for the loess window width $\alpha$,

682: for the MACHO (top) and OGLE (bottom) data.}

683: \end{figure}

684:

685: \begin{figure}

686: \epsfysize=8.0cm

687: \epsffile{fig7.eps}

688: \caption{A comparison of the optimal loess fit to the MACHO data, and

689: the linear regression from (1).}

690: \end{figure}

691:

692: \begin{figure}

693: \epsfysize=8.0cm

694: \epsffile{fig8.eps}

695: \caption{A comparison of the optimal loess fit to the OGLE data, and

696: the linear regression from (1).}

697: \end{figure}

698:

699: \begin{figure}

700: \epsfysize=8.0cm

701: \epsffile{fig9.eps}

702: \caption{The positions (with respect to the linear regression line)

703: of the upper and lower 95\% confidence limits

704: on the loess fit to the MACHO data.}

705: \end{figure}

706:

707: \begin{figure}

708: \epsfysize=8.0cm

709: \epsffile{fig10.eps}

710: \caption{The positions (with respect to the linear regression line)

711: of the upper and lower 95\% confidence limits

712: on the loess fit to the OGLE data.}

713: \end{figure}

714:

715: \begin{figure}

716: \epsfysize=8.0cm

717: \epsffile{fig11.eps}

718: \caption{Differences between the linear fit and the loess (black, less smooth)

719: and thin plate regression spline (red, smooth) results for the MACHO data.}

720: \end{figure}

721:

722: \begin{figure}

723: \epsfysize=8.0cm

724: \epsffile{fig12.eps}

725: \caption{Differences between the linear fit and the loess (black, less smooth)

726: and thin plate regression spline (red, smooth) results for the OGLE data.}

727: \end{figure}

728:

729:

730: \begin{figure}

731: \epsfysize=8.0cm

732: \epsffile{fig13.eps}

733: \caption{The 1216 observations constituting the MACHO dataset. Filled squares mark

734: the three points selected for deletion on the basis of residual diagnostics.}

735: \end{figure}

736:

737: \begin{figure}

738: \epsfysize=8.0cm

739: \epsffile{fig14.eps}

740: \caption{The 723 observations constituting the OGLE dataset. Filled squares mark

741: the six points selected for deletion on the basis of residual diagnostics.}

742: \end{figure}

743:

744:

745: \begin{figure}

746: \epsfysize=8.0cm

747: \epsffile{fig15.eps}

748: \caption{The regression functions $f_P$ [see Eqn. (8)] for the OGLE (top)and

749: MACHO (bottom) data. The $\pm 2$ standard error confidence limits are plotted

750: as solid lines: these are indistinguishable from the functions except for

751: the longer period MACHO data.}

752: \end{figure}

753:

754: \begin{figure}

755: \epsfysize=8.0cm

756: \epsffile{fig16.eps}

757: \caption{The regression functions $f_C$ [see Eqn. (8)] for the MACHO (left) and

758: OGLE (right) data. The $\pm 2$ standard error confidence limits are plotted

759: as solid lines.}

760: \end{figure}

761:

762: \begin{figure}

763: \epsfysize=8.0cm

764: \epsffile{fig17.eps}

765: \caption{The regression functions $f_P$ for the MACHO data (see Fig. 15, bottom plot)

766: prewhitend by a linear fit, in order to show more clearly the deviations from

767: linearity. The $\pm 2$ standard error bounds are also plotted.}

768: \end{figure}

769:

770: \clearpage

771:

772: \begin{figure}

773: \epsfysize=8.0cm

774: \epsffile{fig18.eps}

775: \caption{A loess regression function fitted to the $\log P$--$(V-I)$ data from

776: the OGLE observations. The solid lines are the 95\% confidence envelopes, obtained

777: by bootstrapping.}

778: \end{figure}

779:

780: \begin{figure}

781: \epsfysize=8.0cm

782: \epsffile{fig19.eps}

783: \caption{A loess regression function fitted to the $\log P$--$(V-R)$ data from

784: the MACHO observations. The solid lines are the 95\% confidence envelopes, obtained

785: by bootstrapping.}

786: \end{figure}

787:

788: \begin{figure}

789: \epsfysize=13.0cm

790: \epsffile{fig20.ps}

791: \caption{A contour plot of the function $f_{PC}$ in (11) fitted to the MACHO data.

792: The contour values decrease from +1.5 at the top left, in steps of 0.5, to -2 at

793: the extreme right. The $\pm 1$ standard error bounds for each contour line are also

794: shown.}

795: \end{figure}

796:

797:

798: \end{document}

799: