abstract:b498c4682c1a6edd.tex

1: \begin{abstract}

2: There has been increasing demand for establishing privacy-preserving methodologies for modern statistics and machine learning.

3: Differential privacy, a mathematical notion from computer science, is a rising tool offering robust privacy guarantees.

4: Recent work focuses primarily on developing differentially private versions of individual statistical and machine learning tasks, with nontrivial upstream pre-processing typically not incorporated.

5: An important example is when record linkage is done prior to downstream modeling.

6: Record linkage refers to the statistical task of linking two or more data sets of the same group of entities without a unique identifier.

7: This probabilistic procedure brings additional uncertainty to the subsequent task. In this paper, we present two differentially private algorithms for linear regression with linked data.

8: In particular, we propose a noisy gradient method and a sufficient statistics perturbation approach for the estimation of regression coefficients.

9: We investigate the privacy-accuracy tradeoff by providing finite-sample error bounds for the estimators, which allows us to understand the relative contributions of linkage error, estimation error, and the cost of privacy.

10: The variances of the estimators are also discussed.

11: We demonstrate the performance of the proposed algorithms through simulations and an application to synthetic data.

12:

13: \end{abstract}

14: