b498c4682c1a6edd.tex
1: \begin{abstract}
2: There has been increasing demand for establishing privacy-preserving methodologies for modern statistics and machine learning. 
3: Differential privacy, a mathematical notion from computer science, is a rising tool offering robust privacy guarantees. 
4: Recent work focuses primarily on developing differentially private versions of individual statistical and machine learning tasks, with nontrivial upstream pre-processing typically not incorporated. 
5: An important example is when record linkage is done prior to downstream modeling. 
6: Record linkage refers to the statistical task of linking two or more data sets of the same group of entities without a unique identifier. 
7: This probabilistic procedure brings additional uncertainty to the subsequent task. In this paper, we present two differentially private algorithms for linear regression with linked data. 
8: In particular, we propose a noisy gradient method and a sufficient statistics perturbation approach for the estimation of regression coefficients.
9: We investigate the privacy-accuracy tradeoff by providing finite-sample error bounds for the estimators, which allows us to understand the relative contributions of linkage error, estimation error, and the cost of privacy. 
10: The variances of the estimators are also discussed.
11: We demonstrate the performance of the proposed algorithms through simulations and an application to synthetic data.
12: 
13: \end{abstract}
14: