1: \begin{abstract}
2: Differential Privacy (DP) provides a formal privacy guarantee preventing adversaries with access to a machine learning model from extracting information about individual training points.
3: Differentially Private Stochastic Gradient Descent (DP-SGD), the most popular DP training method for deep learning, realizes this protection by injecting noise during training.
4: However previous works have found that DP-SGD often leads to a significant degradation in performance on standard image classification benchmarks.
5: Furthermore, some authors have postulated that DP-SGD inherently performs poorly on large models, since the norm of the noise required to preserve privacy is proportional to the model dimension.
6: In contrast, we demonstrate that DP-SGD on over-parameterized models can perform significantly better than previously thought.
7: Combining careful hyper-parameter tuning with simple techniques to ensure signal propagation and improve the convergence rate, we obtain a new SOTA without extra data on CIFAR-10 of 81.4\% under $\mathbf{(8, 10^{-5})}$-DP using a 40-layer Wide-ResNet, improving over the previous SOTA of 71.7\%.
8: When fine-tuning a pre-trained NFNet-F3, we achieve a remarkable 83.8\% top-1 accuracy on ImageNet under $\mathbf{(0.5, 8\cdot 10^{-7})}$-DP. Additionally, we achieve 86.7$\%$ top-1 accuracy under $\mathbf{(8, 8 \cdot 10^{-7})}$-DP, only 4.3$\%$ below the current non-private SOTA for this task.
9: We believe our results are a significant step towards closing the accuracy gap between private and non-private image classification.
10: \end{abstract}
11: