1: \begin{abstract}
2:
3: % Pretraining multilingual language models from scratch requires considerable computational resources and substantial training data.
4: % Therefore,
5: Instead of pretraining multilingual language models from scratch, a more efficient method is to adapt existing pretrained language models (PLMs) to new languages via vocabulary
6: extension and continued pretraining.
7: However, this method usually randomly initializes the embeddings of new subwords and introduces substantially more embedding parameters to the model, thus weakening the efficiency.
8: To address these issues,
9: we propose a novel framework: \textbf{O}ne \textbf{F}or \textbf{A}ll (\textbf{\textsc{Ofa}}), which wisely initializes the embeddings of unseen subwords and thus can adapt a PLM to multiple languages efficiently and effectively.
10: \textsc{Ofa} takes advantage of external well-aligned multilingual static word vectors and injects the alignment knowledge into the subword embeddings. In addition, \textsc{Ofa}
11: applies matrix factorization and replaces the cumbersome embeddings with two lower-dimensional matrices,
12: which largely reduces the number of parameters.
13: We show \textsc{Ofa} accelerates the convergence of continued pretraining,
14: which is environmentally friendly as much fewer carbon footprints are generated.
15: Through extensive experiments, we demonstrate \textsc{Ofa} can achieve competitive or better performance than default continued pretraining baselines on a wide range of crosslingual downstream tasks. We make our code and models publicly available.\footnote{\url{https://github.com/cisnlp/ofa}}
16: \end{abstract}
17: