6a9b44137c203656.tex
1: \begin{abstract}  
2: A key challenge in contrastive learning is to generate negative samples from a large sample set to contrast with positive samples, for learning better encoding of the data. These negative samples often follow a softmax distribution which are dynamically updated during the training process. However, sampling from this distribution is non-trivial due to the high computational costs in computing the partition function.  
3: In this paper, we propose an \underline{E}fficient \underline{M}arkov \underline{C}hain Monte Carlo negative sampling method for \underline{C}ontrastive learning ({\algname}). We follow the global contrastive learning loss as introduced in \cite{yuan2022provable}, and propose {\algname} which utilizes an adaptive Metropolis-Hastings subroutine to generate hardness-aware negative samples in an online fashion during the optimization. 
4: We prove that {\algname} finds an $\mathcal{O}(1/\sqrt{T})$-stationary point of the global contrastive loss in $T$ iterations.
5: Compared to prior works, {\algname} is the first algorithm that exhibits global convergence (to stationarity) regardless of the choice of batch size while exhibiting low computation and memory cost. 
6: Numerical experiments validate that {\algname} is effective with small batch training and achieves comparable or better performance than baseline algorithms. We report the results for pre-training image encoders on {\tt STL-10} and {\tt Imagenet-100}.
7: \end{abstract}
8: