1: \begin{abstract}
2:
3: Layer normalization (LayerNorm) has been successfully applied to various deep neural networks to help stabilize training and boost model convergence because of its capability in handling re-centering and re-scaling of both inputs and weight matrix.
4: However, the computational overhead introduced by LayerNorm makes these improvements expensive and significantly slows the underlying network, e.g. RNN in particular.
5: In this paper, we hypothesize that re-centering invariance in LayerNorm is dispensable and propose root mean square layer normalization, or \textit{RMSNorm}. RMSNorm regularizes the summed inputs to a neuron in one layer according to root mean square (RMS), giving the model re-scaling invariance property and implicit learning rate adaptation ability.
6: RMSNorm is computationally simpler and thus more efficient than LayerNorm.
7: We also present partial RMSNorm, or \textit{$p$RMSNorm} where the RMS is estimated from $p$\% of the summed inputs without breaking the above properties.
8: Extensive experiments on several tasks using diverse network architectures show that RMSNorm achieves comparable performance against LayerNorm but reduces the running time by 7\%$\sim$64\% on different models. Source code is available at \url{https://github.com/bzhangGo/rmsnorm}.
9: \end{abstract}
10: