1: \begin{abstract}
2: Multilingual Large Language Models (LLMs) achieve remarkable levels of zero-shot cross-lingual transfer performance.
3: We speculate that this is predicated on their ability to align languages without explicit supervision from parallel sentences.
4: While representations of translationally equivalent sentences in different languages are known to be similar \emph{after convergence}, however, it remains unclear how such cross-lingual alignment emerges \emph{during pre-training} of LLMs.
5: Our study leverages intrinsic probing techniques, which identify which subsets of neurons encode linguistic features, to correlate the degree of cross-lingual neuron overlap with the zero-shot cross-lingual transfer performance for a given model.
6: In particular, we rely on checkpoints of BLOOM, a multilingual autoregressive LLM, across different training steps and model scales.
7: We observe a high correlation between neuron overlap and downstream performance, which supports our hypothesis on the conditions leading to effective cross-lingual transfer.
8: Interestingly, we also detect a degradation of both implicit alignment and multilingual abilities in certain phases of the pre-training process, providing new insights into the multilingual pretraining dynamics.\footnote{Our code is available at: \url{https://github.com/ErikaaWang/probing-multilingual-dynamics}}
9: \end{abstract}
10: