abstract:6a9b44137c203656.tex

1: \begin{abstract}

2: A key challenge in contrastive learning is to generate negative samples from a large sample set to contrast with positive samples, for learning better encoding of the data. These negative samples often follow a softmax distribution which are dynamically updated during the training process. However, sampling from this distribution is non-trivial due to the high computational costs in computing the partition function.

3: In this paper, we propose an \underline{E}fficient \underline{M}arkov \underline{C}hain Monte Carlo negative sampling method for \underline{C}ontrastive learning ({\algname}). We follow the global contrastive learning loss as introduced in \cite{yuan2022provable}, and propose {\algname} which utilizes an adaptive Metropolis-Hastings subroutine to generate hardness-aware negative samples in an online fashion during the optimization.

4: We prove that {\algname} finds an $\mathcal{O}(1/\sqrt{T})$-stationary point of the global contrastive loss in $T$ iterations.

5: Compared to prior works, {\algname} is the first algorithm that exhibits global convergence (to stationarity) regardless of the choice of batch size while exhibiting low computation and memory cost.

6: Numerical experiments validate that {\algname} is effective with small batch training and achieves comparable or better performance than baseline algorithms. We report the results for pre-training image encoders on {\tt STL-10} and {\tt Imagenet-100}.

7: \end{abstract}

8: