abstract:4fc4c312e5a8bda6.tex

1: \begin{abstract}

2: Truncated linear regression is a classical challenge in statistics, wherein

3: a label, $y = w^T x + \varepsilon$, and its corresponding feature vector, $x \in \mathbb{R}^k$, are only observed if the label

4: falls in some subset $S \subseteq \mathbb{R}$; otherwise the existence of the

5: pair $(x, y)$ is hidden from observation. Linear regression with truncated observations has remained a challenge, in its general form, since the early

6: works of~\cite{tobin1958estimation,amemiya1973regression}. When the distribution of the

7: error is normal with known variance, recent work of~\cite{daskalakis2019computationally} provides computationally and statistically efficient estimators of the linear model, $w$. In this paper, we provide the first computationally and statistically efficient

8: estimators for truncated linear regression when the noise variance is unknown, estimating both the linear model and the variance of the noise. Our estimator is based on an efficient implementation of Projected Stochastic Gradient Descent on the negative log-likelihood of the truncated sample. Importantly, we show that the error of our estimates is asymptotically normal, and we use this to provide explicit confidence regions for our estimates.

9:

10: % We provide two estimators, using two different instantiations Projected Stochastic Gradient Descent (PSGD), with or without replacement, on the negative log-likelihood of the truncated sample. Each estimator attains better convergence rates in different ranges of parameters. % and therefore we provide the first estimation for the truncated linear

11: % regression problem that also has explicit confidence region. The method that

12: % we present in this paper and hence be used not only for estimation but also for

13: % inference.

14: \end{abstract}

15: