1: \begin{abstract}
2: {Current} deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification.
3: To defend against such attacks, an effective and popular approach, known as \textit{adversarial training (AT)}, has been shown to mitigate the {negative} impact of adversarial attacks by virtue of a min-max robust training method.
4: While effective, it remains unclear whether it can successfully be adapted to the distributed learning context.
5: The power of distributed optimization over multiple machines
6: enables us to scale up robust training over large models and datasets. Spurred by that,
7: we propose
8: \textit{distributed adversarial training ({DAT})},
9: a \textit{large-batch} adversarial training framework implemented over multiple machines. We show that {DAT} is general, which supports training over labeled and unlabeled data,
10: multiple types of attack generation methods, and gradient compression operations favored for distributed optimization.
11: Theoretically, we provide, under standard conditions in the optimization theory, the convergence rate of {DAT} to the first-order stationary points in general non-convex settings. Empirically, we demonstrate that {DAT} either matches or outperforms state-of-the-art robust accuracies and achieves a graceful training
12: speedup (e.g., on ResNet--50 under ImageNet). Codes are available at \url{https://github.com/dat-2022/dat}.
13: \end{abstract}
14: