A key challenge in contrastive learning is to generate negative samples from a large sample set to contrast with positive samples, for learning better encoding of the data. These negative samples often follow a softmax distribution which are dynamically updated during the training process. However, sampling from this distribution is non-trivial due to the high computational costs in computing the partition function. In this paper, we propose an Efficient Markov Chain Monte Carlo negative sampling method for Contrastive learning (EMC$^2$). We follow the global contrastive learning loss as introduced in SogCLR, and propose EMC$^2$ which utilizes an adaptive Metropolis-Hastings subroutine to generate hardness-aware negative samples in an online fashion during the optimization. We prove that EMC$^2$ finds an $\mathcal{O}(1/\sqrt{T})$-stationary point of the global contrastive loss in $T$ iterations. Compared to prior works, EMC$^2$ is the first algorithm that exhibits global convergence (to stationarity) regardless of the choice of batch size while exhibiting low computation and memory cost. Numerical experiments validate that EMC$^2$ is effective with small batch training and achieves comparable or better performance than baseline algorithms. We report the results for pre-training image encoders on STL-10 and Imagenet-100.
翻译:对比学习的一个关键挑战是从大规模样本集中生成负样本以与正样本进行对比,从而学习更优的数据编码。这些负样本通常服从一个在训练过程中动态更新的softmax分布。然而,由于计算配分函数的高昂计算成本,从该分布中采样并非易事。本文提出了一种基于高效马尔可夫链蒙特卡洛负采样的对比学习方法(EMC$^2$)。我们沿用SogCLR中引入的全局对比学习损失,并提出了EMC$^2$方法,该方法在优化过程中利用自适应Metropolis-Hastings子程序在线生成具有难度感知的负样本。我们证明,EMC$^2$能在$T$次迭代内找到全局对比损失的$\mathcal{O}(1/\sqrt{T})$-稳定点。与先前工作相比,EMC$^2$是首个无论批量大小如何选择均能实现全局收敛(至稳定点),且同时保持低计算与内存成本的算法。数值实验验证了EMC$^2$在小批量训练中的有效性,且其性能达到或优于基准算法。我们报告了在STL-10和ImageNet-100数据集上预训练图像编码器的实验结果。