abstract:fe6e240bc5f4e2c9.tex

1: \begin{abstract}

2:

3: Exponential Moving Average (EMA) is a widely used weight averaging (WA) regularization to learn flat optima for better generalizations without extra cost in deep neural network (DNN) optimization.

4: Despite achieving better flatness, existing WA methods might fall into worse final performances or require extra test-time computations.

5: This work unveils the full potential of EMA with \textit{a single line of modification}, \textit{i.e.}, switching the EMA parameters to the original model after each epoch, dubbed as Switch EMA (SEMA).

6: From both theoretical and empirical aspects, we demonstrate that SEMA can help DNNs to reach generalization optima that better trade-off between flatness and sharpness.

7: To verify the effectiveness of SEMA, we conduct comparison experiments with discriminative, generative, and regression tasks on vision and language datasets, including image classification, self-supervised learning, object detection and segmentation, image generation, video prediction, attribute regression, and language modeling.

8: Comprehensive results with popular optimizers and networks show that SEMA is a free lunch for DNN training by improving performances and boosting convergence speeds.

9: % Comprehensive experiments on vision and language tasks with popular optimizers and networks show that Switch EMA is a free lunch for optimizing DNNs with better performances and convergence speeds.

10:

11: \end{abstract}

12: