0d4534fe94ca32c6.tex
1: \begin{abstract}
2: The one-epoch overfitting phenomenon has been widely observed in industrial Click-Through Rate (CTR) applications, where the model performance experiences a significant degradation at the beginning of the second epoch. 
3: Recent advances try to understand the underlying factors behind this phenomenon through extensive experiments.
4: However, it is still unknown whether a multi-epoch training paradigm could achieve better results, as the best performance is usually achieved by one-epoch training.
5: In this paper, we hypothesize that the emergence of this phenomenon may be attributed to the susceptibility of the embedding layer to overfitting, which can stem from the high-dimensional sparsity of data.
6: To maintain feature sparsity while simultaneously avoiding overfitting of embeddings, we propose a novel Multi-Epoch learning with Data Augmentation (MEDA), which can be directly applied to most deep CTR models.
7: MEDA achieves data augmentation by reinitializing the embedding layer in each epoch, thereby avoiding embedding overfitting and simultaneously improving convergence.
8: To our best knowledge, MEDA is the first multi-epoch training paradigm designed for deep CTR prediction models.
9: We conduct extensive experiments on several public datasets, and the effectiveness of our proposed MEDA is fully verified.
10: Notably, the results show that MEDA can significantly outperform the conventional one-epoch training.
11: Besides, MEDA has exhibited significant benefits in a real-world scene on Kuaishou.
12: \end{abstract}
13: