1: \begin{abstract}
2: Recent studies have demonstrated a perceivable improvement on the performance of neural machine translation by applying cross-lingual language model pretraining \cite{Lample2019CrosslingualLM}, especially the Translation Language Modeling (TLM).
3: To alleviate the need for expensive
4: parallel corpora by TLM, in this work, we incorporate the translation information from dictionaries into the pretraining process and propose a novel Bilingual Dictionary-based Language Model (BDLM).
5: % We evaluate our BDLM on WMT-News'19 Zh-En \cite{tiedemann2012parallel}, WMT'20 news-commentary Zh-En, and WMT'16 Ro-En.
6: We evaluate our BDLM in Chinese, English, and Romanian.
7: % Pretrained on these combined corpora, the BDLM is then fine-tuned for NMT.
8: For Chinese-English, we obtained a 55.0 BLEU on WMT-News'19 \cite{tiedemann2012parallel} and a 24.3 BLEU on WMT'20 news-commentary, outperforming the Vanilla Transformer \cite{vaswani2017attention} by more than 8.4 BLEU and 2.3 BLEU, respectively.
9: According to our results, the BDLM also has advantages on convergence speed and predicting rare words.
10: The increase in BLEU for WMT'16 Romanian-English also shows its effectiveness in low-resources language translation.
11: % Our code and pretrained models will be publicly available.
12: \end{abstract}
13: