0eee4ba99c4c5755.tex
1: \begin{abstract}
2: Ultra-high dimensional longitudinal data are increasingly common
3: and the analysis is challenging both theoretically and methodologically.
4: We offer a new automatic procedure for finding
5: a sparse semivarying coefficient model, which is widely accepted for
6: longitudinal data analysis.
7: Our proposed method first reduces the number of covariates to a moderate
8: order by employing a screening procedure, and then identifies both
9: the varying and \mbox{constant} coefficients using a group SCAD estimator,
10: which is subsequently refined by
11: accounting for the within-subject correlation.
12: The screening procedure is based on working independence and B-spline
13: marginal models.
14: Under weaker conditions than those in the literature, we show that with
15: high probability only
16: irrelevant variables will be screened out, and the number of selected
17: variables can be bounded by a moderate order. This allows the desirable
18: sparsity and oracle properties of the subsequent structure
19: identification step. %It also marks the significance of our theory and
20: %methodology,
21: Note that existing methods require some kind of iterative screening in
22: order to achieve this,
23: thus they demand heavy computational effort and consistency is not
24: guaranteed. %We prove that our group SCAD estimator detects the
25: %constant and varying effects simultaneously.
26: The refined semivarying coefficient model employs
27: profile least squares, local linear smoothing and nonparametric
28: covariance estimation,
29: and is semiparametric efficient.
30: We also suggest ways to implement
31: the proposed methods, and to select the tuning parameters. An extensive
32: simulation study
33: is summarized to demonstrate its finite sample performance and the
34: yeast cell cycle data is analyzed.
35: \end{abstract}