Recent years have witnessed many successful applications of contrastive learning in diverse domains, yet its self-supervised version still remains many exciting challenges. As the negative samples are drawn from unlabeled datasets, a randomly selected sample may be actually a false negative to an anchor, leading to incorrect encoder training. This paper proposes a new self-supervised contrastive loss called the BCL loss that still uses random samples from the unlabeled data while correcting the resulting bias with importance weights. The key idea is to design the desired sampling distribution for sampling hard true negative samples under the Bayesian framework. The prominent advantage lies in that the desired sampling distribution is a parametric structure, with a location parameter for debiasing false negative and concentration parameter for mining hard negative, respectively. Experiments validate the effectiveness and superiority of the BCL loss.
翻译:近年来,对比学习在各领域取得了众多成功应用,但其自监督版本仍面临诸多挑战。由于负样本从无标注数据集中抽取,随机选取的样本可能成为锚点的假阴性样本,导致编码器训练出现偏差。本文提出一种名为BCL损失的新型自监督对比损失函数,该方法仍使用无标注数据中的随机样本,但通过重要性权重纠正由此产生的偏差。核心思想是在贝叶斯框架下设计用于采样困难真阴性样本的理想采样分布。其显著优势在于该理想采样分布具有参数化结构,其中位置参数用于消除假阴性偏差,浓度参数用于挖掘困难阴性样本。实验验证了BCL损失的有效性与优越性。