dc200662e819e8ba.tex
1: \begin{abstract}
2: Adversarial training is one of the most powerful methods to improve the robustness of pre-trained language models (PLMs).
3: However, this approach is typically more expensive than traditional fine-tuning because of the necessity to generate adversarial examples via gradient descent. 
4: Delving into the optimization process of adversarial training, we find that robust connectivity patterns emerge in the early training phase (typically $0.15\sim0.3$ epochs), far before parameters converge.
5: Inspired by this finding, we dig out robust early-bird tickets (i.e., subnetworks) to develop an efficient adversarial training method: (1) searching for robust tickets with structured sparsity in the early stage;
6: (2) fine-tuning robust tickets in the remaining time.
7: To extract the robust tickets as early as possible, we design a ticket convergence metric to automatically terminate the searching process.
8: Experiments show that the proposed efficient adversarial training method can achieve up to $7\times \sim 13 \times$ training speedups while maintaining comparable or even better robustness compared to the most competitive state-of-the-art adversarial training methods.
9: 
10: 
11: \end{abstract}
12: