0706.1762/PLC.tex
1: \documentclass{article}
2: \usepackage{epsf,graphicx}
3: % --- page style definitions ---
4: 
5: \setlength{\textwidth}{17cm}
6: \setlength{\textheight}{20cm}
7: \setlength{\oddsidemargin}{0pt}
8: \setlength{\evensidemargin}{0pt}
9: \setlength{\topmargin}{0pt}
10: 
11: \newlength{\refindent}
12: \setlength{\refindent}{\parindent}
13: \newlength{\parskiplen}
14: \setlength{\parskiplen}{2.5mm}
15: 
16: \def\btheta{\mbox{\boldmath$\theta$}}
17: 
18: \begin{document}
19: 
20: % --- reference and figure caption environments ---
21: 
22: \newenvironment{references}{\clearpage
23: 			    \section*{\large \bf REFERENCES}
24: 			    \parindent=0mm \everypar{\hangindent=3pc
25: 			    \hangafter=1}}{\parindent=\refindent \clearpage}
26: \newenvironment{figcaps}{\clearpage
27: 			 \section*{\large  \bf FIGURE CAPTIONS}}{}
28: \newcommand{\fig}[2]{\parbox[t]{2.0cm}{Figure #1:} \
29: 		   \parbox[t]{13.5cm}{#2}\\[\baselinestretch\parskiplen]}
30: 
31: % --- title ---
32: 
33: \begin{titlepage}
34: \begin{center}
35: \vspace*{0.5cm}
36: {\huge The Detailed Forms of the LMC Cepheid}\\[0.5cm]
37: {\huge PL and PLC Relations}\\[3.0cm]
38: 
39: {\large C. Koen$^1$, S. Kanbur$^2$ and C. Ngeow$^3$}\\[1cm]
40: \normalsize
41: {\em  1 Department of Statistics, University of the Western Cape,
42: Private Bag X17, Bellville, 7535 Cape, South Africa}\\[0.6cm]
43: {\em  2 Department of Physics, State University of New York at Oswego, 
44: Oswego, NY 13126, USA}\\[0.6cm]
45: {\em  3 Department of Astronomy, University of Illinois, 
46: Urbana-Champaign, IL 61801, USA}\\[2.0cm]
47: 
48: \end{center}
49: 
50: 
51: 
52: %\setlength{\baselineskip}{0.8cm}
53: \begin{quotation}\noindent{\bf ABSTRACT.}
54: Possible deviations from linearity of the LMC Cepheid PL and PLC relations are
55: investigated. Two datasets are studied, respectively from the OGLE and MACHO 
56: projects. A nonparametric test, based on linear regression residuals,
57: suggests that neither PL relation is linear. If colour dependence is allowed for
58: then the MACHO PL relation is found to deviate more significantly from the linear,
59: while the OGLE PL relation is consistent with linearity. These finding are confirmed
60: by fitting ``Generalised Additive Models" (nonparametric regression functions)
61: to the two datasets. Colour dependence is shown to be nonlinear in both datasets,
62: distinctly so in the case of the MACHO Cepheids. It is also shown that there is
63: interaction between the period and colour functions in the MACHO data. 
64: 
65: \vspace*{1.0cm}
66: {\bf Key words:} methods: statistical - stars: variables: Cepheids - 
67: cosmology: distance scale
68: \end{quotation}
69: 
70: 
71: 
72: \end{titlepage}
73: 
74: \section{INTRODUCTION}
75: 
76: Cepheids are important objects in Astrophysics, both because of their use in the 
77: extra-galactic distance
78: scale and their role in stellar evolution. Their regularly repeating 
79: light curves offer an important opportunity
80: to test theories of stellar evolution against stellar pulsation: mass-luminosity 
81: (ML) relations mandated from evolutionary calculations
82: can be used as input to full linear and non-linear hydrodynamic models of 
83: Cepheids and compared to observations. These ML relations contain
84: input about evolutionary physics such as the amount of convective overshoot. 
85: Constraining theoretical models with observations can be used to gain 
86: considerable insight into
87: evolutionary/pulsation physics. On the other hand the Cepheid period-luminosity 
88: (PL) relation has played an important role in establishing the
89: extra-galactic distance scale and the subsequent estimation of Hubble's constant, 
90: $H_0$. The $HST$ Key Project (Freedman et al. 2001) has used $HST$ observations of
91: Cepheids in a number of galaxies to estimate $H_0$ to within $10\%$ accuracy. 
92: The crucial step in this work has been the Cepheid PL relation in the
93: Large Magellanic Cloud (LMC) which has been used to characterize a Cepheid PL 
94: relation template. This PL template has
95: traditionally been thought to be linear, however there has also been recent work 
96: implying a variation of the slope with period in the LMC (Tammann \& Reindl 2002; 
97: Kanbur \& Ngeow 2004, 2006; Sandage et al. 2004; Kanbur et al. 2007a; 
98: Ngeow et al. 2005; Ngeow \& Kanbur 2006a,b). 
99: 
100: Ngeow and Kanbur (2006c) estimate the error in estimating $H_0$, if a linear 
101: Cepheid PL relation is assumed
102: and the underlying relation is "non-linear"
103: at a period of 10 days, and find this can lead to an error of about $1-2\%$. 
104: Such an error seems small but with significant
105: work being carried out to reduce zero point errors (Macri et al 2006), it is 
106: important to construct as accurate a distance scale as possible that is independent of
107: the CMB. Further, table 2 of Spergel et al (2007) points to the fact that an 
108: independent estimate of $H_0$, accurate to less than
109: $5\%$, will help to break the degeneracy between ${\Omega}_{matter}$ and $H_0$ 
110: present from WMAP CMB studies. An independent estimate
111: of $H_0$ accurate to $1\%$ will result in a reduction of the $65\%$ confidence 
112: interval on ${\Omega}_{matter}$ by almost a factor of two over
113: that with WMAP data alone. 
114: 
115: In previous studies, a rigorous statistical test, the $F$ test, was 
116: applied to the LMC Cepheids to test for the linear versus non-linear 
117: PL relation. Here by ``non-linear'' we mean two lines of significantly 
118: differing slope which are continuous at a period of 10 days. The $F$ test
119:  results that were obtained from the OGLE (Optical Gravitational Lensing Experiment, 
120: Udalski et al. 1999) and MACHO Cepheid data, in Kanbur \& Ngeow (2004; 2006) 
121: and Ngeow et al. (2005) respectively, strongly imply that the LMC PC/PL 
122: relations are non-linear. It is important to note that several other 
123: statistical tests, such as the ${\chi}^2$ tests, least absolute deviation, 
124: robust estimation and loess procedures, were also applied to the MACHO data, 
125: and these results also point to a non-linear LMC PL relation 
126: (Ngeow et al. 2005). Recently, Kanbur et al (2007a) developed the use of 
127: testimators and a likelihood based method using the Schwarz Information 
128: Criterion, to study non-linearities in the LMC PL relation (using both 
129: OGLE and MACHO Cepheid data) and again came to the same conclusion: the 
130: LMC Cepheid PL relation is non-linear in the sense described above. The 
131: $F$ test also suggested that the LMC period-colour (PC) relation is 
132: non-linear, in contrast to the Galactic and SMC (Small Magellanic Cloud) 
133: PC relations (Kanbur \& Ngeow 2004). Since the question of the non-linearity 
134: of the LMC PL relation is important in distance scale and stellar studies, 
135: it is vital to establish this as firmly as possible; this is one of 
136: the motivations for this paper.
137: 
138: In addition to investigating the non-linearity of the LMC PL relation, we 
139: also study the LMC period-luminosity-colour (PLC) relation.
140: A number of authors, including Sandage (1958) and Madore and Freedman (1991) 
141: have derived the Period-Luminosity-Color (PLC) relation and shown how it arises from 
142: the period-mean density theorem, the Stefan-Boltzmann law and the existence of 
143: an instability strip. These authors also point out that the PL/PC relations are 
144: obtained from the PLC relation by averaging over the variable not included
145: in the relation. 
146: 
147: In Section 2, we briefly describe the 
148: data used in our study. In Section 3 we apply a preliminary test study 
149: on the LMC PL relation. This is followed by more detailed analysis in Section 4,
150: based on a non-parametric model fitting procedure. 
151: An extension to the PLC relation is presented in Section 5. 
152: The conclusion and discussion of our results are given in Section 6.
153: 
154: We add a few sentences on the use of non-parametric 
155: methods in what follows. The term ``non-parametric" is actually used in three slightly 
156: different senses. First, the major innovation (sections 4 and 5) in this paper is the 
157: use of ``non-parametric regression". The meaning is {\it not} necessarily the usual 
158: one of ``distribution-free": rather, it means that the form of the regression is not
159: specified -- the regression function is ``unstructured", being dictated by the data
160: itself. Of course, this flexibility allows one to detect subtleties which may 
161: otherwise be overlooked. Second, in the next section of the paper we use a well-known 
162: distribution-free statistic, the ``Wald-Wolfowitz runs test". This non-parametric 
163: statistic uses only data ranks, and hence typically not very powerful. Third, also 
164: in the next section use is made of a permutation method. This avoids 
165: distributional assumptions about the data, by 
166: using re-orderings of the data itself to establish significance levels.  
167: 
168: \section{THE DATA}
169: 
170: We use two sets of LMC Cepheid data in our study. The first data set is the 
171: extinction corrected $V$-band mean magnitudes and $(V-I)$ colours for the OGLE 
172: LMC Cepheids taken from Kanbur \& Ngeow (2006), supplemented with additional 
173: Cepheids from Sebo et al (2002), and referred as ``OGLE'' data in this paper. 
174: The second data set is the MACHO Cepheids data, with extinction corrected $V$ 
175: mean magnitudes and $(V-R)$ colours, adopted from Ngeow et al (2005).
176: Using these two data sets allow us to compare the results, particularly for the 
177: different photometric filters used.
178: 
179: A possible complication is that any apparent non-linearity in PL or PLC
180: relations could be caused by 
181: extinction errors which are a function of colour or period.
182: Arguments against extinction errors as a cause of observed non-linear LMC PL 
183: and PC relations were presented in Kanbur \& Ngeow (2004), Kanbur \& Ngeow (2006),
184: Kanbur et al. (2007b), Ngeow et al. (2005), Ngeow \& Kanbur (2006b) and
185: Sandage et al. (2004), and will therfore not be repeated in detail here.
186: In particular, a possible period dependency of extinction errors has been 
187: investigated in Ngeow \& Kanbur (2006b). If such extinction errors were present, 
188: then the PC relations at maximum light would be such that LMC Cepheids would get 
189: hotter at maximum light as the pulsation period increases: a fact which would
190: be hard to reconcile with pulsation theory especially as Galactic Cepheids, 
191: in common with LMC Cepheids, display a flat PC relation at
192: maximum light (Kanbur \& Ngeow 2004, 2006). Further, the dependence of extinction 
193: error on colour would need to
194: be very complicated to explain both the non-linearity at mean light whilst 
195: preserving the flatness at maximum light. 
196: 
197: It is also noted that the reddening values adopted here are the {\it same} 
198: as those used in many distance scale studies (Freedman et al. 2001).
199: 
200: \section{A PRELIMINARY INVESTIGATION BASED ON A TEST PROCEDURE}
201: 
202: Figs. 1 and 2 show the MACHO and OGLE PL data, with least squares
203: linear fits of the form
204: \begin{equation}
205: V=a+b \log P +{\rm error} \; .
206: \end{equation}
207: For the sake of completeness,
208: \begin{eqnarray}
209: V&=&17.08(0.026)-2.70(0.039) \log P \;\;\;\;\;(MACHO)\nonumber\\
210: V&=&17.05(0.020)-2.69(0.028) \log P \;\;\;\;\;\;(OGLE)
211: \end{eqnarray}
212: where standard errors of coefficient estimates are given in brackets.
213: Although both fits are excellent, it is nonetheless
214: of some interest whether there may be subtle deviations from the strictly
215: linear relations between $V$ and $\log P$ shown by the lines:
216: although this may have little importance for prediction of luminosity
217: given the period, it could (e.g.) have an important bearing on the modelling
218: of Cepheid pulsations.
219: 
220: A simple procedure which provides some insight into the problem is
221: to study partial sums of the residuals of the least squares fits. 
222: First arrange the data so that the period values are in ascending order:
223: $$P_1 <P_2<P_3<\ldots <P_N $$
224: where $N$ is the sample size. Then
225: \begin{equation}
226: C(j)=\sum_{k=1}^j [V_k-a-b \log P_k]=\sum_{k=1}^j r_k
227: \end{equation}
228: are the partial sums of the residuals $r_k$. If there are no deviations 
229: from linearity, then $C(j)$ is the sum of uncorrelated random numbers and 
230: hence a simple random walk. However, if there are deviations 
231: from linearity successive residuals may be correlated, and hence $C(j)$
232: will not be a simple random walk. Partials sums of the $r_k$ can be
233: seen in Figs. 3 and 4.
234: 
235: A statistic which can be used for testing whether the partial sum
236: is a pure random walk is its vertical range
237: $$R=\max_j C(j)-\min_j C(j) \; :$$
238: this may be expected to be inflated by positively correlated residuals. Significance
239: levels for the values of $R$ are readily obtained by permutation, as 
240: follows:
241: \begin{itemize}
242: \item[(i)]
243: Permute the $r_k$; this will randomise the residuals by destroying any
244: possible trends.
245: \item[(ii)]
246: The partial sums of the permuted $r_k$ will be true random walks -- 
247: find the statistic $R$ for the permutation.
248: \item[(iii)]
249: Repeat steps (i) and (ii) a large number of times, noting the values of
250: $R$. 
251: \item[(iv)]
252: Determine the fraction of permutation $R$-values which exceeds the observed
253: value -- this estimates the significance level of the observed $R$.
254: \end{itemize}
255: Applying 10000 permutations, significance levels of 3\% and 4\% were
256: obtained for the MACHO and OGLE data respectively, suggesting 
257: meaningful deviation of the observed $r_k$ from randomness. The implication
258: is therefore that the PL relation is not perfectly linear.
259: 
260: Study of Figs. 3 and 4 shows that there is an excess of positive residuals 
261: for $\log P \sim 0.5$ and $\log P>1$, and an excess of negative values
262: for $0.8<\log P<1$.
263: 
264: Interestingly, application of the standard Wald-Wolfowitz runs test
265: (e.g. Conover 1971) for randomness of the residuals gives conflicting results
266: for the two datasets -- significance levels of 45\% and 0.9\% for
267: the OGLE and MACHO data respectively. Of course, the procedure
268: uses only the signs, and not the sizes, of the $r_k$.
269: 
270: It is known that Cepheids follow a PLC, rather than simply a PL,
271: relation. It may therefore be prudent to replace (1) by 
272: \begin{equation}
273: C(j)=\sum_{k=1}^j [V_k-a-b \log P_k-c(CI)_k]
274: \end{equation}
275: where $(CI)$ indicates a colour index, with regression coefficient
276: $c$. This has a substantial influence on the significance levels 
277: of the statistic $R$: for the OGLE data is increases to 33\%, while
278: the level for the MACHO data is reduced to 0.7\%. The corresponding
279: Wald-Wolfowitz test levels are 43\% and 1.5\%. 
280: 
281: To summarise, there is strong evidence of non-randomness in the residuals
282: of the MACHO data, both for the PL and the PLC relations. For the OGLE
283: data the results are ambiguous. 
284: 
285: 
286: \section{PL RELATION}
287: 
288: An alternative to the imposition of a fully specified parametric
289: model such as (1) is to allow the form of the regression to be
290: dictated by the data. The idea is conveniently illustrated by
291: a technique known as ``loess" (see e.g. Cleveland \& Devlin 1988). 
292: Ngeow et al (2005)
293: initially used this method on MACHO data and found a similar result to 
294: that reported here. Here we study it in more detail and apply it to both 
295: MACHO and OGLE Cepheid data.
296: The method entails fitting a low order polynomial (in the present
297: case a straight line) over restricted sections (``windows") of the
298: data by weighted least squares. In the implementation here the only
299: free parameter is the width $\alpha$ of the window, which is usually 
300: given as a fraction of the range of the independent variable (i.e.
301: $0<\alpha \le 1$) . The smaller $\alpha$
302: the more ``local" the estimated regression, and the more detail 
303: it shows. Fig. 5 shows a loess regression of the OGLE data, using
304: $\alpha=0.05$; if $\alpha$ is increased towards unity the loess regression
305: resembles the linear fit of Fig. 2. 
306: 
307: A key element is then obviously the choice of window width $\alpha$, and
308: it is desirable to use an objective method to find it. This is readily
309: done by ``cross-validation": 
310: \begin{itemize}
311: \item[(i)]
312: Choose a value of the window width $\alpha$.
313: \item[(ii)]
314: Leave out the first datapoint and obtain a loess estimate  
315: $\widehat{V}_1$ of the magnitude $V_1$
316: by fitting the regression to the remaining data.
317: \item[(iii)]
318: Note the discrepancy 
319: $$\Delta_1=V_1-\widehat{V}_1$$
320: between the true and predicted values.
321: \item[(iv)]
322: Repeat steps (ii)-(iii) for the second, third,..., last datapoints,
323: giving the set $\Delta_1, \Delta_2,\ldots,\Delta_N$ of discrepancies.
324: \item[(v)]
325: The value of the cross-validation criterion for the value of $\alpha$
326: from (i) is the defined as
327: \begin{equation}
328: CV(\alpha)=\frac{1}{N} \sum_{j=1}^N \Delta_j^2
329: =\frac{1}{N} \sum_{j=1}^N (V_j-\widehat{V}_j)^2
330: \end{equation}
331: Clearly, it evaluates the predictive power over all the observations
332: of the loess fit based on the particular value of $\alpha$.
333: \item[(vi)]
334: Repeat steps (i)-(v) for all candidate values of $\alpha$.
335: \item[(vii)]
336: The optimal $\alpha$ is that which minimises $CV(\alpha)$.
337: \end{itemize}
338: 
339: The cross-validation functions for the two datasets are plotted in Fig. 6;
340: optimal window widths are 0.36 and 0.20 
341: respectively for the MACHO and OGLE observations. In Figs. 7 and
342: 8 the resultant loess
343: functions are compared to the regression lines from (1). A small difference
344: between the curves over the approximate interval $0.8<\log P<1$ is visible
345: in both diagrams. There is also a substantial disagreement at the longest
346: periods for the MACHO results in Fig. 7: this is clearly due to the 
347: {\it systematic} difference between the data and the linear regression
348: line for $\log P>1.25$ (see Fig. 1). Similarly, the slight divergence 
349: between the loess and linear regression lines at the longest periods
350: in Fig. 8, can be traced to the influence of the two OGLE datapoints with
351: $\log P>1.7$ (see Fig. 2).
352: 
353: The question arises as to whether the discrepancies between the loess
354: curves and the straight line fits are at all meaningful. In order
355: to address this issue confidence intervals for the loess curves are 
356: estimated by bootstrapping (e.g. Efron \& Tibshirani 1993). The results,
357: based on 5000 bootstrap samples, are plotted in Figs. 9 and 10. 
358: Rather than showing the linear regression line and the 95% upper and 
359: lower limits, the {\it difference} between the linear fit and the
360: confidence limits are plotted, in order to more clearly display
361: the deviations. It is notable that the linear fits lie outside the
362: confidence intervals for the loess functions for $0.8<\log P<1$ roughly.
363: This supports previous work which has suggested a "break" around a 
364: period $\log P \approx 1$
365: (Kanbur \& Ngeow 2004, Ngeow et al 2005, Kanbur et al 2007a).
366: 
367: The {\textsc R} software add-on package ``mcgv" contains an alternative 
368: nonparametric regression facility in the form of thin plate regression 
369: splines (TPRS) (e.g. Wood 2006). The form of cross-validation used is based
370: on a balance between the sum of squared model residuals (which measures
371: the goodness of the model fit) and a smoothness term. Cross-validation in
372: mcgv is automated. 
373: 
374: The loess and TPRS results are compared for the MACHO and OGLE respectively
375: in Figs. 11 and 12. The agreement is very good -- in particular, the deviations 
376: from linearity for $0.8<\log P<1$ are also evident in the TPRS results.
377: Despite the fact
378: that more effective degrees of freedom are required for the nonparametric
379: fits (6.41 and 8.71 for the TPRS fits to the MACHO and OGLE data respectively)
380: than for linear regression (3 degrees of freedom), the former fits follow
381: the data considerably more closely. Model selection tools such as the
382: ``Akaike Information Criterion" (AIC, e.g. Burnham \& Anderson 2002) can be used to test 
383: whether the improved model fit warrants the additional degrees of freedom expended.
384: In this case, the TPRS fits are both preferred by very wide margins. 
385: 
386:  
387: \section{PLC RELATION}
388: 
389: Unusual datapoints can have substantial, often somewhat distorting,
390: influences on regression surfaces. It is therefore worthwhile examining
391: the datasets carefully in order to identify such data. This is most
392: easily done using ordinary multiple linear least squares regression.
393: 
394: Fitting PLC relations to the two datasets give the results 
395: \begin{eqnarray}
396: V&=&16.23(0.026)-3.30(0.029) \log P +3.95(0.093)(V-R)\;\;\;\;\;(MACHO)
397: \nonumber\\
398: V&=&15.97(0.025)-3.23(0.018) \log P +2.30(0.049)(V-I)\;\;\;\;\;\;(OGLE)
399: \end{eqnarray}
400: with residual standard deviations 0.164 and 0.097 mag.
401: Regression diagnostics were examined in order to identify observations
402: which gave rise to large residuals and/or were unduly influential on
403: parameter estimates. ``Cooks's $D$" statistic was used for the latter
404: purpose -- see e.g. Montgomery, Peck \& Vining (2001) (or almost any
405: other modern text devoted to linear regression theory). Three points were 
406: eliminated
407: from the MACHO data, and four from the OGLE data, on the basis of these 
408: diagnostics.
409: The PLC relations were then re-estimated for the reduced datasets, and the
410: new sets of diagnostics examined. This led to a further two deletions from
411: the OGLE data. The final results, replacing (6), are 
412: \begin{eqnarray}
413: V&=&16.23(0.026)-3.32(0.029) \log P +4.00(0.092)(V-R)\;\;\;\;\;(MACHO)
414: \nonumber\\
415: V&=&15.89(0.021)-3.29(0.015) \log P +2.48(0.041)(V-I)\;\;\;\;\;\;(OGLE)
416: \end{eqnarray}
417: with residual standard deviations of 0.162 and 0.074 mag. The substantial 
418: reduction in residual variance, and large changes in regression 
419: coefficients for the OGLE results are particularly striking.
420: 
421: It is interesting to examine the positions of the rejected observations in
422: three-dimensional dataplots. The plots in Figs. 11 and 12 were obtained by 
423: selecting perspectives which clearly show the positions of all questionable
424: data. It is clear the observations for each dataset lie close to a plane, 
425: and that points with unsatisfactory regression diagnostics (marked by 
426: squares) all deviate from the plane. The fact that the plane in Fig. 12 
427: (OGLE data) is so well-defined explains why removal of the outlying points 
428: made such a substantial difference to the estimated coefficients. In the 
429: remainder of this paper we work with the reduced datasets ($N=1213, 717$ 
430: for MACHO and OGLE data respectively). Note that one high-influence datum
431: in the OGLE data is retained (for the brightest Cepheid -- see Fig. 12), 
432: since its associated residual is very small, and since its omission
433: has very little influence on the values of the three estimated parameters.
434: 
435: An obvious extension of the linear PLC relation to the nonparametric case 
436: is the so-called ``Generalised Additive Model"
437: \begin{equation}
438: V=\alpha+f_P(\log P)+f_C(CI)+{\rm error} 
439: \end{equation}
440: where $\alpha$ is a constant; $CI$ denotes a colour index; 
441: and $f_P$ and $f_C$ are nonparametric 
442: regression functions such as loess or TPRS fits. Due to the several
443: attractive features (automated cross-validation, to mention but one)
444: the {\textsc R} add-on package is once again used to perform
445: TPRS fits of (8) to the data.
446: 
447: The results can be seen in Figs. 15  and 16. The estimated $f_P$ for the 
448: OGLE data is linear: the effective degrees of freedom, 1.00, confirms
449: this. By implication the model (8) reduces to
450: \begin{equation}
451: V=\alpha+\beta \log P+f_C(CI)+{\rm error} \;.
452: \end{equation}
453: Not surprisingly, the AICs of models (8) and (9) 
454: are exactly equal for the OGLE data.
455: 
456: The function $f_P$ for the MACHO data shows the familiar deviation from
457: linearity in the range $0.8<\log P<1$; this is more clearly demonstrated
458: in Fig. 17, where a linear fit to $f_P$ has been subtracted. 
459: 
460: 
461: Inspection of the $f_C$ functions in Fig. 16 shows that both are 
462: distinctly nonlinear.
463: 
464: It is of obvious interest to investigate why $f_P$ reduces to the perfectly
465: linear form in the case of the OGLE data, when the dependence of $V$ on 
466: $\log P$ in the PL relation is nonlinear. Examining the relationship between
467: $\log P$ and the colour index $(V-I)$ gives some insight into this question.
468: The results of a loess regression of $(V-I)$ on $\log P$ for the OGLE
469: data are displayed in
470: Fig. 18. The 95\% confidence intervals, obtained from 5000 bootstrap
471: samples, are also shown. Calculations were done using a smoothing 
472: window of width 0.20, as indicated by cross-validation. The analogous 
473: plot for the MACHO data, based on a smoothing window width of 0.33,
474: is in Fig. 19. In the case of the OGLE data there is a
475: clear change in the relationship between $\log P$ and $(V-I)$ 
476: in the neighbourhood $0.8<\log P<1$. It appears that small deviations from
477: linearity in the $PL$ relation in Fig. 8 are compensated by the colour dependence.
478: In the case of the MACHO data the kink in the $PC$ plot (Fig. 19) is of similar 
479: size to that in Fig. 18, but the deviation from linearity in the $PL$ plot is
480: larger (Fig. 7). This may explain why the $f_P$ function remains nonlinear
481: in the case of the MACHO $PLC$ relation. These results support similar work
482: presented in Kanbur and Ngeow (2004) and Ngeow and Kanbur (2005) 
483: on the non-linearity of the LMC PC relation using $F$ 
484: tests, and on the linearity of the LMC Wessenheit function.
485: 
486: Nonparametric regression lends itself to much more flexible forms than
487: ordinary multiple regression. Two possible alternatives to (8) are
488: \begin{equation}
489: V=\alpha+f_P(\log P)+f_C(CI)+f_{PC}(\log P, CI)+{\rm error} 
490: \end{equation}
491: and
492: \begin{equation}
493: V=\alpha+f_{PC}(\log P,CI)+{\rm error} 
494: \end{equation}
495: which allows for interaction between the two independent variables. 
496: 
497: The two Generalized Additive Models (10) and (11) were 
498: also fitted to both datasets.
499: For the OGLE data, the AIC-preferred model is (10), but 
500: a more detailed analysis (ANOVA) shows that the contribution from
501: the interaction function $f_{PC}$ is not significant -- hence the
502: model effectively reduces to (8). For the MACHO data the pure
503: interaction model (11) is preferred, with (10) the second choice. 
504: According to the AIC, the additive model (8) is a very distant 
505: third choice. A contour plot of the fit of the model (11) can
506: be seen in Fig. 20 -- this demonstrates why (8) is inadequate.
507: Of course, in practice (11) would be more tedious to work with
508: than the simpler additive form (8).
509: 
510: A few words of explanation of Fig. 20 may be in order. The form of
511: a purely linear PLC relation would of course be
512: $$V=a+b\log P+c CI+{\rm error} \; .$$
513: One way of displaying this graphically would be to draw the
514: lines 
515: $$V={\rm constant}$$
516: in the $\log P$-$CI$ plane, for various values of the constant.
517: The equations describing these contour lines are 
518: $$CI=(V-b\log P-{\rm constant})/c +{\rm error} \; ,$$
519: i.e. straight lines with slope $-b/c$. Fig. 20, the
520: equivalent for the non-parametric function $f_{PC}$, shows
521: not only that the relations are nonlinear, but also that there
522: is ``interaction" -- the form of the relation depends on the region
523: of the $\log P$--$(V-R)$ plane it inhabits.
524: 
525: \section{CONCLUSIONS \& DISCUSSION}
526: 
527: It should perhaps come as no surprise that with the acquisition of
528: large amounts of new data finer detail in the relationships between
529: astrophysical observables are uncovered. 
530: The best-fitting models of the two datasets are
531: given by (11) (MACHO) and (9) (OGLE) respectively, which both
532: are both nonlinear.
533: 
534: Estimates of the effect of such small non-linearities 
535: on the Cepheid distance scale and
536: on Hubble's constant are given in Ngeow and Kanbur (2006c) and amount to $1-2\%$. 
537: Such an error seems small but in the
538: era of "precision cosmology" with a drive toward a distance scale accurate to 
539: $5\%$, such an effect is important. Perhaps just
540: as important, a proper characterization of the precise detail in the observed 
541: phenomena will assist in placing improved constraints
542: on pulsation models of Cepheids and in particular on their ML relations, and 
543: hence on details of stellar evolutionary
544: physics such as the amount of convective core overshoot.
545: 
546: A possible physical explanation for this non-linearity is outlined in the 
547: papers by Kanbur et al. (2004), Kanbur \& Ngeow (2006) and
548: Kanbur et al. (2007b), which studied
549: Galactic, LMC and SMC Cepheid models respectively. Briefly, these papers 
550: suggest the non-linearity is caused by the interaction of the hydrogen
551: ionization front (HIF) and photosphere and the way this interaction varies with 
552: period. At low densities, if the HIF and photosphere are engaged
553: (i.e. the photosphere lies at the base of the HIF) then the temperature of 
554: the photosphere and hence the colour of the star are almost independent
555: of global stellar properties such as the period. Since the relative location 
556: of the photosphere and HIF varies with the $L/M$ ratio, and since this
557: varies with period, modelling has implied that for LMC Cepheids with a period 
558: greater than 10 days, the photosphere and HIF are not engaged. Thus these
559: stars have a different PC relation than their shorter period counterparts, 
560: Because the PC and PL relations are really forms of the PLC relation, then
561: a change in the PC relation results in a change in the PL relation. 
562: Galactic Cepheids are such that the HIF-photosphere interaction only really 
563: occurs at maximum light at low densities. LMC Cepheids are such that this 
564: HIF-photosphere interaction starts to occur at low densities only
565: for Cepheids with periods greater than 10 days. SMC Cepheids are such that 
566: this HIF-photosphere interaction always occurs at high densities (Kanbur et al. 
567: 2004; Kanbur \& Ngeow 2006; Kanbur et al. 2007b). 
568: 
569: \section*{\large \bf ACKNOWLEDGMENTS}
570: The authors are grateful for the efforts of those who have developed
571: and maintained the {\textsc R} statistical software. SMK acknowledges 
572: support from a small research grant from
573: the American Astronomical Society and the Chretien International research grant. 
574: CN acknowledges financial support 
575: from NSF award OPP-0130612 and a University of Illinois seed funding 
576: award to the Dark Energy Survey.
577: 
578: 
579: \begin{references}
580: Burnham K.P., Anderson D.R., 2002, Model Selection and Multimodel Inference:
581:   a Practical Information-Theoretic Approach (Second Edition). Springer, New York
582: 
583: Cleveland W.S., Devlin S.J., 1988, J. Amer. Stat. Assoc., 83, 597
584: 
585: Conover W.J., 1971, Practical Nonparametric Statistics. John Wiley \&
586:     Sons Inc., New York
587: 
588: Efron B., Tibshirani R.J., 1993, An Introduction to the Bootstrap.
589:     Chapman \& Hall, London
590: 
591: Freedman, W., et al., 2001, ApJ, 553, 47
592: 
593: Kanbur, S. \& Ngeow, C., 2004, MNRAS, 350, 962
594: 
595: Kanbur, S. \& Ngeow, C., Buchler R., 2004, MNRAS, 354, 212
596: 
597: Kanbur, S. \& Ngeow, C., 2006, MNRAS, 369, 705
598: 
599: Kanbur, S., Ngeow, C., Nanthakumar, A. \& Stevens, R., 2007a, PASP, 119, 512
600: 
601: Kanbur S., Ngeow C., Feiden G., 2007b, submitted
602: 
603: Macri L., Stanek K., Bersier D., Greenhill L., Reid M., 2006, ApJ, 652, 1133
604: 
605: Madore B., Freeman W., 1991, PASP, 103, 933
606: 
607: Montgomery D.C., Peck E.A., Vining G.G., 2001, Introduction to Linear
608:      Regression Analysis (Third Edition). John Wiley \& Sons, Inc., New
609:      York
610: 
611: Ngeow, C. \& Kanbur, S., 2005, MNRAS, 360, 1033
612: 
613: Ngeow, C., Kanbur, S., Nikolaev, S., Buonaccorsi, J., Cook, K. \& Welch, D., 
614:    2005, MNRAS, 363, 831
615: 
616: Ngeow, C. \& Kanbur, S., 2006a, MNRAS, 369, 723
617: 
618: Ngeow, C. \& Kanbur, S., 2006b, ApJ, 650, 180
619: 
620: Ngeow, C. \& Kanbur, S., 2006c, ApJ, 642, L29
621: 
622: Sandage, A., 1958, ApJ, 127, 513
623: 
624: Sandage, A., Tammann, G. A. \& Reindl, B., 2004, A\&A, 424, 43
625: 
626: Sebo, K., et al., 2002, ApJS, 142, 71
627: 
628: Spergel D., et al., 2007, ApJ, in press (ArXiv:astro-ph/0603449)
629: 
630: Tammann, G. A. \& Reindl, B., 2002, Astrophys. \& Space Sci., 280, 165
631: 
632: Udalski, A., Soszynski, I., Szymanski, M., Kubiak, M., Pietrzynski, G., 
633:   Wozniak, P., \& Zebrun, K. 1999, Acta Astron., 49, 223
634: 
635: Wood S., 2006, Generalized Additive Models. An Introduction with R.
636:   Chapman \& Hall/CRC, Boca Raton (Fl)
637: \end{references}
638: 
639: 
640: \pagebreak
641: 
642: 
643: \pagebreak
644: 
645: \begin{figure}
646: \epsfysize=8.0cm
647: \epsffile{fig1.eps}
648: \caption{MACHO PL data for LMC Cepheids. The line is a linear
649: least squares fit to the data.}
650: \end{figure}
651: 
652: \begin{figure}
653: \epsfysize=8.0cm
654: \epsffile{fig2.eps}
655: \caption{OGLE PL data for LMC Cepheids. The line is a linear
656: least squares fit to the data.}
657: \end{figure}
658: 
659: \begin{figure}
660: \epsfysize=8.0cm
661: \epsffile{fig3.eps}
662: \caption{Partial sums of the residuals from the fit in Fig. 1.} 
663: \end{figure}
664: 
665: \begin{figure}
666: \epsfysize=8.0cm
667: \epsffile{fig4.eps}
668: \caption{Partial sums of the residuals from the fit in Fig. 2.} 
669: \end{figure}
670: 
671: \begin{figure}
672: \epsfysize=8.0cm
673: \epsffile{fig5.eps}
674: \caption{An illustrative loess regression on the OGLE PL data. The window
675: width is 0.05, i.e. 5\% of the range of $\log P$.} 
676: \end{figure}
677: 
678: \begin{figure}
679: \epsfysize=9.0cm
680: \epsffile{fig6.eps}
681: \caption{Cross-validation functions for the loess window width $\alpha$,
682: for the MACHO (top) and OGLE (bottom) data.}
683: \end{figure}
684: 
685: \begin{figure}
686: \epsfysize=8.0cm
687: \epsffile{fig7.eps}
688: \caption{A comparison of the optimal loess fit to the MACHO data, and
689: the linear regression from (1).}
690: \end{figure}
691: 
692: \begin{figure}
693: \epsfysize=8.0cm
694: \epsffile{fig8.eps}
695: \caption{A comparison of the optimal loess fit to the OGLE data, and
696: the linear regression from (1).}
697: \end{figure}
698: 
699: \begin{figure}
700: \epsfysize=8.0cm
701: \epsffile{fig9.eps}
702: \caption{The positions (with respect to the linear regression line) 
703: of the upper and lower 95\% confidence limits
704: on the loess fit to the MACHO data.}
705: \end{figure}
706: 
707: \begin{figure}
708: \epsfysize=8.0cm
709: \epsffile{fig10.eps}
710: \caption{The positions (with respect to the linear regression line) 
711: of the upper and lower 95\% confidence limits
712: on the loess fit to the OGLE data.}
713: \end{figure}
714: 
715: \begin{figure}
716: \epsfysize=8.0cm
717: \epsffile{fig11.eps}
718: \caption{Differences between the linear fit and the loess (black, less smooth)
719: and thin plate regression spline (red, smooth) results for the MACHO data.}
720: \end{figure}
721: 
722: \begin{figure}
723: \epsfysize=8.0cm
724: \epsffile{fig12.eps}
725: \caption{Differences between the linear fit and the loess (black, less smooth)
726: and thin plate regression spline (red, smooth) results for the OGLE data.}
727: \end{figure}
728: 
729: 
730: \begin{figure}
731: \epsfysize=8.0cm
732: \epsffile{fig13.eps}
733: \caption{The 1216 observations constituting the MACHO dataset. Filled squares mark
734: the three points selected for deletion on the basis of residual diagnostics.}
735: \end{figure}
736: 
737: \begin{figure}
738: \epsfysize=8.0cm
739: \epsffile{fig14.eps}
740: \caption{The 723 observations constituting the OGLE dataset. Filled squares mark
741: the six points selected for deletion on the basis of residual diagnostics.}
742: \end{figure}
743: 
744: 
745: \begin{figure}
746: \epsfysize=8.0cm
747: \epsffile{fig15.eps}
748: \caption{The regression functions $f_P$ [see Eqn. (8)] for the OGLE (top)and
749: MACHO (bottom) data. The $\pm 2$ standard error confidence limits are plotted
750: as solid lines: these are indistinguishable from the functions except for 
751: the longer period MACHO data.}
752: \end{figure}
753: 
754: \begin{figure}
755: \epsfysize=8.0cm
756: \epsffile{fig16.eps}
757: \caption{The regression functions $f_C$ [see Eqn. (8)] for the MACHO (left) and
758: OGLE (right) data. The $\pm 2$ standard error confidence limits are plotted
759: as solid lines.}
760: \end{figure}
761: 
762: \begin{figure}
763: \epsfysize=8.0cm
764: \epsffile{fig17.eps}
765: \caption{The regression functions $f_P$ for the MACHO data (see Fig. 15, bottom plot)
766: prewhitend by a linear fit, in order to show more clearly the deviations from
767: linearity. The $\pm 2$ standard error bounds are also plotted.}
768: \end{figure}
769: 
770: \clearpage
771: 
772: \begin{figure}
773: \epsfysize=8.0cm
774: \epsffile{fig18.eps}
775: \caption{A loess regression function fitted to the $\log P$--$(V-I)$ data from
776: the OGLE observations. The solid lines are the 95\% confidence envelopes, obtained
777: by bootstrapping.}
778: \end{figure}
779: 
780: \begin{figure}
781: \epsfysize=8.0cm
782: \epsffile{fig19.eps}
783: \caption{A loess regression function fitted to the $\log P$--$(V-R)$ data from
784: the MACHO observations. The solid lines are the 95\% confidence envelopes, obtained
785: by bootstrapping.}
786: \end{figure}
787: 
788: \begin{figure}
789: \epsfysize=13.0cm
790: \epsffile{fig20.ps}
791: \caption{A contour plot of the function $f_{PC}$ in (11) fitted to the MACHO data.
792: The contour values decrease from +1.5 at the top left, in steps of 0.5, to -2 at
793: the extreme right. The $\pm 1$ standard error bounds for each contour line are also
794: shown.} 
795: \end{figure}
796: 
797: 
798: \end{document}
799: