The information noise-contrastive estimation (InfoNCE) loss function provides the basis of many self-supervised deep learning methods due to its strong empirical results and theoretic motivation. Previous work suggests a supervised contrastive (SupCon) loss to extend InfoNCE to learn from available class labels. This SupCon loss has been widely-used due to reports of good empirical performance. However, in this work we find that the prior SupCon loss formulation has questionable justification because it can encourage some images from the same class to repel one another in the learned embedding space. This problematic intra-class repulsion gets worse as the number of images sharing one class label increases. We propose the Supervised InfoNCE REvisited (SINCERE) loss as a theoretically-justified supervised extension of InfoNCE that eliminates intra-class repulsion. Experiments show that SINCERE leads to better separation of embeddings from different classes and improves transfer learning classification accuracy. We additionally utilize probabilistic modeling to derive an information-theoretic bound that relates SINCERE loss to the symmeterized KL divergence between data-generating distributions for a target class and all other classes.
翻译:信息噪声对比估计(InfoNCE)损失函数因其强大的实证结果和理论动机,成为许多自监督深度学习方法的基石。先前的研究提出了一种监督对比(SupCon)损失,以将InfoNCE扩展到可利用现有类别标签进行学习。由于报道了良好的实证性能,该SupCon损失已被广泛使用。然而,在本工作中,我们发现先前的SupCon损失公式存在合理性疑问,因为它可能鼓励来自同一类别的部分图像在学习的嵌入空间中相互排斥。这种有问题的类内排斥会随着共享同一类别标签的图像数量增加而加剧。我们提出了监督信息噪声对比估计重探(SINCERE)损失,作为InfoNCE在理论上有依据的监督扩展,以消除类内排斥。实验表明,SINCERE能更好地分离不同类别的嵌入,并提高迁移学习的分类准确率。此外,我们利用概率建模推导出一个信息论界限,该界限将SINCERE损失与目标类别和所有其他类别的数据生成分布之间的对称化KL散度联系起来。