c17f9462e068491c.tex
1: \begin{abstract}
2: {Current} deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate  classification. 
3:  To defend against such attacks, an effective and popular approach, known as \textit{adversarial training (AT)}, has been shown to mitigate the {negative} impact of adversarial attacks by virtue of a min-max robust training method. 
4:  While effective, it remains unclear whether it can successfully be adapted to the distributed learning context. 
5:  The power of distributed optimization over multiple machines 
6:   enables us to scale up robust training over large models and  datasets. Spurred by that, 
7:  we propose
8:  \textit{distributed adversarial training ({DAT})},
9:  a \textit{large-batch} adversarial training framework implemented over multiple machines. We show that {DAT} is general, which supports training over labeled and unlabeled data,
10: multiple types of attack generation methods, and   gradient compression operations favored for distributed optimization.
11:  Theoretically, we provide, under standard conditions in the optimization theory, the convergence rate of {DAT} to the first-order stationary points in general non-convex settings. Empirically, we demonstrate that {DAT} either matches or outperforms state-of-the-art robust accuracies and achieves a graceful training
12:  speedup (e.g., on ResNet--50 under ImageNet). Codes are available at \url{https://github.com/dat-2022/dat}.
13: \end{abstract}
14: