abstract:7d58340fa3924057.tex

1: \begin{abstract}

2: \noindent \medskip

3:

4: One of the common challenges faced by researchers in recent data analysis is missing values. In the context of penalized linear regression,

5: which has been extensively explored over several decades, missing values introduce bias and yield a non-positive definite covariance matrix of the

6: covariates, rendering the least square loss function non-convex. In this paper, we propose a novel procedure called the linear shrinkage positive

7: definite (LPD) modification to address this issue. The LPD modification aims to modify the covariance matrix of the covariates in order to ensure consistency

8: and positive definiteness. Employing the new covariance estimator, we are able to transform the penalized regression problem into a convex one,

9: thereby facilitating the identification of sparse solutions. Notably, the LPD modification is computationally efficient and can be expressed analytically.

10: In the presence of missing values, we establish the selection consistency and prove the convergence rate of the $\ell_1$-penalized regression estimator with LPD, showing an $\ell_2$-error convergence rate of square-root of $\log p$ over $n$ {\color{black}by a factor of $(s_0)^{3/2}$ ($s_0$: the number of non-zero coefficients).}

11: To further evaluate the effectiveness of our approach, we analyze real data from the

12: Genomics of Drug Sensitivity in Cancer (GDSC) dataset. This dataset provides incomplete measurements of drug sensitivities of cell lines and their protein expressions.

13: We conduct a series of penalized linear regression models with each sensitivity value serving as a response variable and protein expressions as explanatory variables.

14:

15:

16:

17: \noindent{Keyword:} General missing dependency, lasso, positive definiteness.

18:

19: \end{abstract}

20: