abstract:d3e016e7b608e0cb.tex

1: \begin{abstract}

2:   While significant progress has been made separately on analytics systems

3:   for scalable stochastic gradient descent (SGD) and private SGD,

4:   none of the major scalable analytics frameworks have incorporated

5:   differentially private SGD.

6:   There are two inter-related issues for this disconnect between research

7:   and practice: (1) low model accuracy due to added noise to guarantee privacy,

8:   and (2) high development and runtime overhead of the private algorithms.

9:   This paper takes a first step to remedy this disconnect and

10:   proposes a private SGD algorithm to address {\em both} issues

11:   in an integrated manner.

12:   In contrast to the white-box approach adopted by previous work,

13:   we revisit and use the classical technique of {\em output perturbation} to

14:   devise a novel ``bolt-on'' approach to private SGD.

15:   While our approach trivially addresses (2), it makes (1) even more challenging.

16:   We address this challenge by providing a novel analysis of the $L_2$-sensitivity of SGD,

17:   which allows, under the same privacy guarantees, better convergence of SGD

18:   when only a constant number of passes can be made over the data.

19:   We integrate our algorithm, as well as other state-of-the-art differentially private SGD,

20:   into \Bismarck{}, a popular scalable SGD-based analytics system on top of an RDBMS.

21:   Extensive experiments show that  our algorithm can be easily integrated, incurs virtually

22:   no overhead, scales well, and most importantly, yields substantially better (up to 4X)

23:   test accuracy than the state-of-the-art algorithms on many real datasets.

24: \end{abstract}

25: