abstract:6afa984de4e1ef50.tex

1: \begin{abstract}

2: Large recommendation models (LRMs) are fundamental to the multi-billion dollar online advertising industry, processing massive datasets of hundreds of billions of examples before transitioning to continuous online training to adapt to rapidly changing user behavior \cite{anil2022factory}. The massive scale of data directly impacts both computational costs and the speed at which new methods can be evaluated (R\&D velocity).

3:

4: This paper presents actionable principles and high-level frameworks to guide practitioners in optimizing training data requirements. These strategies have been successfully deployed in Google's largest Ads CTR prediction models \cite{anil2022factory, mcmahan2013ad} and are broadly applicable beyond LRMs. We outline the concept of \emph{data convergence}, describe methods to accelerate this convergence, and finally, detail how to optimally balance training data volume with model size.

5: \end{abstract}

6: