73c885fab0bf8496.tex
1: \begin{abstract}
2: 
3: % Pretraining multilingual language models from scratch requires considerable computational resources and substantial training data.
4: % Therefore, 
5: Instead of pretraining multilingual language models from scratch, a more efficient method is to adapt existing pretrained language models (PLMs) to new languages via vocabulary 
6: extension and continued pretraining. 
7: However, this method usually randomly initializes the embeddings of new subwords and introduces substantially more embedding parameters to the model, thus weakening the efficiency.
8: To address these issues, 
9: we propose a novel framework: \textbf{O}ne \textbf{F}or \textbf{A}ll (\textbf{\textsc{Ofa}}), which wisely initializes the embeddings of unseen subwords and thus can adapt a PLM to multiple languages efficiently and effectively.
10: \textsc{Ofa} takes advantage of external well-aligned multilingual static word vectors and injects the alignment knowledge into the subword embeddings. In addition, \textsc{Ofa} 
11: applies matrix factorization and replaces the cumbersome embeddings with two lower-dimensional matrices, 
12: which largely reduces the number of parameters. 
13: We show \textsc{Ofa} accelerates the convergence of continued pretraining, 
14: which is environmentally friendly as much fewer carbon footprints are generated. 
15: Through extensive experiments, we demonstrate \textsc{Ofa} can achieve competitive or better performance than default continued pretraining baselines on a wide range of crosslingual downstream tasks. We make our code and models publicly available.\footnote{\url{https://github.com/cisnlp/ofa}}
16: \end{abstract}
17: