1: \begin{abstract}
2: There has been increasing demand for establishing privacy-preserving methodologies for modern statistics and machine learning.
3: Differential privacy, a mathematical notion from computer science, is a rising tool offering robust privacy guarantees.
4: Recent work focuses primarily on developing differentially private versions of individual statistical and machine learning tasks, with nontrivial upstream pre-processing typically not incorporated.
5: An important example is when record linkage is done prior to downstream modeling.
6: Record linkage refers to the statistical task of linking two or more data sets of the same group of entities without a unique identifier.
7: This probabilistic procedure brings additional uncertainty to the subsequent task. In this paper, we present two differentially private algorithms for linear regression with linked data.
8: In particular, we propose a noisy gradient method and a sufficient statistics perturbation approach for the estimation of regression coefficients.
9: We investigate the privacy-accuracy tradeoff by providing finite-sample error bounds for the estimators, which allows us to understand the relative contributions of linkage error, estimation error, and the cost of privacy.
10: The variances of the estimators are also discussed.
11: We demonstrate the performance of the proposed algorithms through simulations and an application to synthetic data.
12:
13: \end{abstract}
14: