1: \begin{abstract}
2: One of the core pillars of efficient deep learning methods is architectural improvements such as the residual/skip connection, which has led to significantly better model convergence and quality. Since then the residual connection has become ubiquitous in not just convolutional neural networks but also transformer-based architectures, the backbone of LLMs.
3:
4: In this paper we introduce \emph{Learned Augmented Residual Layer} (\laurel)---a novel generalization of the canonical residual connection---with the goal to be an in-situ replacement of the latter while outperforming on both model quality and footprint metrics.
5: Our experiments show that using \laurel can help boost performance for both vision and language models.
6: %When pre-training a 3B parameter LLM with \laurel over two weeks with 1024 Cloud TPUv5e chips, we improve its performance on a variety of common LLM tasks and dataset combinations, using only $+0.012\%$ extra parameters and without significant latency changes.
7: %
8: %\laurel also outperforms naive model scaling both in terms of model quality and footprint metrics.
9: For example, on the ResNet-50, ImageNet 1K task, it achieves $60\%$ of the gains from adding an extra layer, while only adding $0.003\%$ more parameters, and matches it while adding $2.6\times$ fewer parameters.
10: \end{abstract}
11: