Recent years have witnessed many successful applications of contrastive learning in diverse domains, yet its self-supervised version still remains many exciting challenges. As the negative samples are drawn from unlabeled datasets, a randomly selected sample may be actually a false negative to an anchor, leading to incorrect encoder training. This paper proposes a new self-supervised contrastive loss called the BCL loss that still uses random samples from the unlabeled data while correcting the resulting bias with importance weights. The key idea is to design the desired sampling distribution for sampling hard true negative samples under the Bayesian framework. The prominent advantage lies in that the desired sampling distribution is a parametric structure, with a location parameter for debiasing false negative and concentration parameter for mining hard negative, respectively. Experiments validate the effectiveness and superiority of the BCL loss.
翻译:近年来,对比学习在各个领域取得了许多成功应用,但其自监督版本仍面临诸多令人振奋的挑战。由于负样本来自无标注数据集,随机选取的样本可能实际上是锚点的假负例,从而导致编码器训练错误。本文提出一种新的自监督对比损失函数——BCL损失,该函数仍使用无标签数据中的随机样本,同时通过重要性权重校正由此产生的偏差。核心思想是在贝叶斯框架下设计期望的采样分布,以采样真实的难负样本。其显著优势在于:期望采样分布是一种参数化结构,其中位置参数用于消除假负例偏差,浓度参数用于挖掘难负样本。实验验证了BCL损失的有效性和优越性。