abstract:a9f46095926e2abe.tex

1: \begin{abstract}

2: %   Bootstrap Your Own Latent (\byol) is self-supervised learning that recently obtained close to state-of-the-art results in image representation learning. Unlike many traditional contrastive methods, \byol does not require the presence of negative pairs in the batch loss. Those were previously thought to be absolutely necessary in order to prevent the learning of uninformative or collapsed representations. Thus, it has been hypothesized by the community that \byol carries an implicit negative (contrastive) term that stems from the use of batch normalization. We exhibit empirical evidence that strongly refutes those claims, and, in particular, show that replacing batch normalization with a combination of group normalization and weight standardization, \emph{ceteris paribus}, yields performance comparable to that of original \byol ($73.2 \%$ as opposed to $74.3\%$ top-1 accuracy, with ResNet-50 features, under the linear evaluation protocol on ImageNet). \emph{This suggests that the key to \byol's success does not solely lie with the normalization techniques used, and that other phenomena are at play in its convergence.}

3: % \end{abstract}

4: