The information noise-contrastive estimation (InfoNCE) loss function provides the basis of many self-supervised deep learning methods due to its strong empirical results and theoretic motivation. Previous work suggests a supervised contrastive (SupCon) loss to extend InfoNCE to learn from available class labels. This SupCon loss has been widely-used due to reports of good empirical performance. However, in this work we suggest that the specific SupCon loss formulated by prior work has questionable theoretic justification, because it can encourage images from the same class to repel one another in the learned embedding space. This problematic behavior gets worse as the number of inputs sharing one class label increases. We propose the Supervised InfoNCE REvisited (SINCERE) loss as a remedy. SINCERE is a theoretically justified solution for a supervised extension of InfoNCE that never causes images from the same class to repel one another. We further show that minimizing our new loss is equivalent to maximizing a bound on the KL divergence between class conditional embedding distributions. We compare SINCERE and SupCon losses in terms of learning trajectories during pretraining and in ultimate linear classifier performance after finetuning. Our proposed SINCERE loss better separates embeddings from different classes during pretraining while delivering competitive accuracy.
翻译:信息噪声对比估计(InfoNCE)损失函数因其优异的实证结果和理论动机,成为许多自监督深度学习方法的基础。先前研究提出了有监督对比(SupCon)损失,将InfoNCE扩展至利用现有类别标签进行学习。由于报告的良好实证性能,该SupCon损失已被广泛使用。然而,本研究表明先前工作提出的特定SupCon损失存在理论依据存疑的问题,因为它在学习嵌入空间中可能促使同一类别的图像相互排斥。这种问题行为随共享类别标签的输入数量增加而加剧。本文提出有监督信息噪声对比估计重探(SINCERE)损失作为解决方案。SINCERE是InfoNCE有监督扩展的理论合理方案,能确保同类别图像永不相互排斥。进一步证明,最小化该新损失等价于最大化类别条件嵌入分布之间KL散度的界。我们比较了SINCERE和SupCon损失在预训练阶段的学习轨迹及微调后的最终线性分类器性能。所提出的SINCERE损失能在预训练期间更好分离不同类别的嵌入,同时保持竞争性精度。