abstract:31c45c039325e335.tex

1: \begin{abstract}

2: We propose a new and computationally efficient algorithm for maximizing the

3: observed log-likelihood for a multivariate normal data matrix with missing

4: values. We show that our procedure based on iteratively regressing the

5: missing on the observed variables, generalizes the traditional EM

6: algorithm by alternating between different complete data spaces and

7: performing the E-Step incrementally. In this non-standard setup we prove

8: numerical convergence to a stationary point

9: of the observed log-likelihood.% in a non-standard setup.

10:

11: For high-dimensional data, where the number of variables may greatly exceed

12: sample size, we add a Lasso penalty in the regression part of our algorithm

13: and perform coordinate descent approximations. This leads to a

14: computationally very attractive technique with sparse regression

15: coefficients for missing data imputation. Simulations and results on four

16: microarray datasets show that the new method often outperforms alternative

17: imputation techniques as k-nearest neighbors imputation, nuclear norm

18: minimization or a penalized likelihood approach with an $\ell_1$-penalty on

19: the inverse covariance matrix.\vspace{0.5cm}\\

20: {\bf Keywords} {Missing data, observed likelihood, (partial) E- and M-Step,

21:   Lasso, matrix completion}

22: \end{abstract}

23: