149dcc9d70d8f422.tex
1: \begin{abstract}
2: Neural Machine Translation (NMT) models are typically trained on heterogeneous data that are concatenated and randomly shuffled. {However, not all of the training data are equally useful to the model.} Curriculum training aims to present the data to the NMT models in a meaningful order. In this work, we introduce a two-stage curriculum training framework for NMT where we fine-tune a base NMT model on subsets of data, selected by both deterministic scoring using pre-trained methods and online scoring that considers prediction scores of the emerging NMT model. Through comprehensive experiments on six language pairs comprising low- and high-resource languages from WMT'21, we have shown that our curriculum strategies consistently demonstrate better quality (up to +2.2 BLEU improvement) and faster convergence (approximately 50\% fewer updates).
3: 
4: 
5: 
6: \end{abstract}
7: