fe6e240bc5f4e2c9.tex
1: \begin{abstract}
2: 
3: Exponential Moving Average (EMA) is a widely used weight averaging (WA) regularization to learn flat optima for better generalizations without extra cost in deep neural network (DNN) optimization. 
4: Despite achieving better flatness, existing WA methods might fall into worse final performances or require extra test-time computations.
5: This work unveils the full potential of EMA with \textit{a single line of modification}, \textit{i.e.}, switching the EMA parameters to the original model after each epoch, dubbed as Switch EMA (SEMA). 
6: From both theoretical and empirical aspects, we demonstrate that SEMA can help DNNs to reach generalization optima that better trade-off between flatness and sharpness.
7: To verify the effectiveness of SEMA, we conduct comparison experiments with discriminative, generative, and regression tasks on vision and language datasets, including image classification, self-supervised learning, object detection and segmentation, image generation, video prediction, attribute regression, and language modeling.
8: Comprehensive results with popular optimizers and networks show that SEMA is a free lunch for DNN training by improving performances and boosting convergence speeds.
9: % Comprehensive experiments on vision and language tasks with popular optimizers and networks show that Switch EMA is a free lunch for optimizing DNNs with better performances and convergence speeds.
10: 
11: \end{abstract}
12: