7d58340fa3924057.tex
1: \begin{abstract}
2: \noindent \medskip 
3: 
4: One of the common challenges faced by researchers in recent data analysis is missing values. In the context of penalized linear regression, 
5: which has been extensively explored over several decades, missing values introduce bias and yield a non-positive definite covariance matrix of the 
6: covariates, rendering the least square loss function non-convex. In this paper, we propose a novel procedure called the linear shrinkage positive 
7: definite (LPD) modification to address this issue. The LPD modification aims to modify the covariance matrix of the covariates in order to ensure consistency 
8: and positive definiteness. Employing the new covariance estimator, we are able to transform the penalized regression problem into a convex one, 
9: thereby facilitating the identification of sparse solutions. Notably, the LPD modification is computationally efficient and can be expressed analytically. 
10: In the presence of missing values, we establish the selection consistency and prove the convergence rate of the $\ell_1$-penalized regression estimator with LPD, showing an $\ell_2$-error convergence rate of square-root of $\log p$ over $n$ {\color{black}by a factor of $(s_0)^{3/2}$ ($s_0$: the number of non-zero coefficients).} 
11: To further evaluate the effectiveness of our approach, we analyze real data from the 
12: Genomics of Drug Sensitivity in Cancer (GDSC) dataset. This dataset provides incomplete measurements of drug sensitivities of cell lines and their protein expressions. 
13: We conduct a series of penalized linear regression models with each sensitivity value serving as a response variable and protein expressions as explanatory variables.
14: 
15: 
16: 
17: \noindent{Keyword:} General missing dependency, lasso, positive definiteness.
18: 
19: \end{abstract}
20: