af4568c36ecfd72f.tex
1: \begin{abstract}
2: % \me{Need to emphasize the reason why we study adaptive learning rate and why adaptive privacy; adaptive to the unknown parameters: Lipschitz smoothness, universal constant, optimal parameters?}.
3: % ADP-SGD
4: % We propose an adaptive differentially private method for empirical risk minimization using gradient perturbation. 
5: % \todo{I feel the previous sentence is confusing; what is adaptive? How about: We propose a differentially private method for empirical risk minimization using adaptive stochastic gradient perturbation}
6: % \me{adaptive gradient methods are well-known for AdaGrad or Adam, I don't want people get confused by saying adaptive gradient}
7: 
8: We propose an adaptive (stochastic) gradient perturbation method for differentially private empirical risk minimization. At each iteration, the random noise added to the gradient is optimally adapted to the stepsize; we name this process adaptive differentially private (ADP) learning.  Given the same privacy budget, we prove that the ADP method considerably improves the utility guarantee compared to the standard differentially private method in which vanilla random noise is added. Our method is particularly useful for gradient-based algorithms with time-varying learning rates, including variants of AdaGrad (Duchi et al., 2011). We provide extensive numerical experiments to demonstrate the effectiveness of the proposed adaptive differentially private algorithm.
9: %For adaptive gradient methods, more generally a learning rate with approximately square root decay ${1}/\sqrt{1+t}$ at $t$ iteration, we can improve the convergence bound with ${\log(T)}$ after $T$ iterations while maintaining $(\varepsilon, \delta)-$DP.
10: \end{abstract}
11: