abstract:e2566c10e06d82ef.tex

1: \begin{abstract}

2: Multilingual Large Language Models (LLMs) achieve remarkable levels of zero-shot cross-lingual transfer performance.

3: We speculate that this is predicated on their ability to align languages without explicit supervision from parallel sentences.

4: While representations of translationally equivalent sentences in different languages are known to be similar \emph{after convergence}, however, it remains unclear how such cross-lingual alignment emerges \emph{during pre-training} of LLMs.

5: Our study leverages intrinsic probing techniques, which identify which subsets of neurons encode linguistic features, to correlate the degree of cross-lingual neuron overlap with the zero-shot cross-lingual transfer performance for a given model.

6: In particular, we rely on checkpoints of BLOOM, a multilingual autoregressive LLM, across different training steps and model scales.

7: We observe a high correlation between neuron overlap and downstream performance, which supports our hypothesis on the conditions leading to effective cross-lingual transfer.

8: Interestingly, we also detect a degradation of both implicit alignment and multilingual abilities in certain phases of the pre-training process, providing new insights into the multilingual pretraining dynamics.\footnote{Our code is available at: \url{https://github.com/ErikaaWang/probing-multilingual-dynamics}}

9: \end{abstract}

10: