abstract:73c885fab0bf8496.tex

1: \begin{abstract}

2:

3: % Pretraining multilingual language models from scratch requires considerable computational resources and substantial training data.

4: % Therefore,

5: Instead of pretraining multilingual language models from scratch, a more efficient method is to adapt existing pretrained language models (PLMs) to new languages via vocabulary

6: extension and continued pretraining.

7: However, this method usually randomly initializes the embeddings of new subwords and introduces substantially more embedding parameters to the model, thus weakening the efficiency.

8: To address these issues,

9: we propose a novel framework: \textbf{O}ne \textbf{F}or \textbf{A}ll (\textbf{\textsc{Ofa}}), which wisely initializes the embeddings of unseen subwords and thus can adapt a PLM to multiple languages efficiently and effectively.

10: \textsc{Ofa} takes advantage of external well-aligned multilingual static word vectors and injects the alignment knowledge into the subword embeddings. In addition, \textsc{Ofa}

11: applies matrix factorization and replaces the cumbersome embeddings with two lower-dimensional matrices,

12: which largely reduces the number of parameters.

13: We show \textsc{Ofa} accelerates the convergence of continued pretraining,

14: which is environmentally friendly as much fewer carbon footprints are generated.

15: Through extensive experiments, we demonstrate \textsc{Ofa} can achieve competitive or better performance than default continued pretraining baselines on a wide range of crosslingual downstream tasks. We make our code and models publicly available.\footnote{\url{https://github.com/cisnlp/ofa}}

16: \end{abstract}

17: