ee59821db927de85.tex
1: \begin{abstract}
2: % $n$ input-output samples under quadratic loss
3: %Results apply without assuming a ground truth parameter that relates input and output
4: We study the problem of finding the best linear model that can minimize least-squares loss given a dataset. While this problem is trivial in the low-dimensional regime, it becomes more interesting in high-dimensions where the population minimizer is assumed to lie on a manifold such as sparse vectors. We propose projected gradient descent~(PGD) algorithm to estimate the  population minimizer in the finite sample regime. We establish linear convergence rate and data-dependent estimation error bounds for PGD. Our contributions include: 1) The results are established for heavier tailed sub-exponential distributions besides sub-gaussian. 2) We directly analyze the empirical risk minimization and do not require a realizable model that connects input data and labels. 3) Our PGD algorithm is augmented to learn the bias terms which boosts the performance. The numerical experiments validate our theoretical results.
5: %directly study the regularized empirical risk and 
6: \end{abstract}
7: