31c45c039325e335.tex
1: \begin{abstract}
2: We propose a new and computationally efficient algorithm for maximizing the
3: observed log-likelihood for a multivariate normal data matrix with missing
4: values. We show that our procedure based on iteratively regressing the
5: missing on the observed variables, generalizes the traditional EM
6: algorithm by alternating between different complete data spaces and
7: performing the E-Step incrementally. In this non-standard setup we prove
8: numerical convergence to a stationary point
9: of the observed log-likelihood.% in a non-standard setup.
10: 
11: For high-dimensional data, where the number of variables may greatly exceed
12: sample size, we add a Lasso penalty in the regression part of our algorithm
13: and perform coordinate descent approximations. This leads to a
14: computationally very attractive technique with sparse regression
15: coefficients for missing data imputation. Simulations and results on four
16: microarray datasets show that the new method often outperforms alternative
17: imputation techniques as k-nearest neighbors imputation, nuclear norm
18: minimization or a penalized likelihood approach with an $\ell_1$-penalty on
19: the inverse covariance matrix.\vspace{0.5cm}\\
20: {\bf Keywords} {Missing data, observed likelihood, (partial) E- and M-Step,
21:   Lasso, matrix completion} 
22: \end{abstract}
23: