1: \begin{abstract}
2: This paper presents a simple unsupervised visual representation learning method with a pretext task of discriminating all images in a dataset using a parametric, instance-level classifier.
3: The overall framework is a replica of a supervised classification model, where \textit{semantic classes} (e.g., \textit{dog, bird,} and \textit{ship}) are replaced by \textit{instance IDs}.
4: However, scaling up the classification task from thousands of \textit{semantic labels} to millions of \textit{instance labels} brings specific challenges including 1) the large-scale softmax computation; 2) the slow convergence due to the infrequent visiting of instance samples; and 3) the massive number of negative classes that can be noisy.
5: This work presents several novel techniques to handle these difficulties.
6: First, we introduce a hybrid parallel training framework to make large-scale training feasible.
7: Second, we present a raw-feature initialization mechanism for classification weights, which we assume offers a contrastive prior for instance discrimination and can clearly speed up converge in our experiments.
8: Finally, we propose to smooth the labels of a few hardest classes to avoid optimizing over very similar negative pairs.
9: While being conceptually simple, our framework achieves competitive or superior performance compared to state-of-the-art unsupervised approaches, i.e., SimCLR, MoCoV2, and PIC under ImageNet linear evaluation protocol and on several downstream visual tasks, verifying that full instance classification is a strong pretraining technique for many semantic visual tasks.
10: \end{abstract}
11: