Previous theoretical work on contrastive learning (CL) with InfoNCE showed that, under certain assumptions, the learned representations uncover the ground-truth latent factors. We argue these theories overlook crucial aspects of how CL is deployed in practice. Specifically, they assume that within a positive pair, all latent factors either vary to a similar extent, or that some do not vary at all. However, in practice, positive pairs are often generated using augmentations such as strong cropping to just a few pixels. Hence, a more realistic assumption is that all latent factors change, with a continuum of variability across these factors. We introduce AnInfoNCE, a generalization of InfoNCE that can provably uncover the latent factors in this anisotropic setting, broadly generalizing previous identifiability results in CL. We validate our identifiability results in controlled experiments and show that AnInfoNCE increases the recovery of previously collapsed information in CIFAR10 and ImageNet, albeit at the cost of downstream accuracy. Additionally, we explore and discuss further mismatches between theoretical assumptions and practical implementations, including extensions to hard negative mining and loss ensembles.
翻译:先前关于使用InfoNCE进行对比学习的理论研究显示,在某些假设下,学习到的表征能够揭示真实的潜在因子。我们认为这些理论忽略了对比学习在实际部署中的关键方面。具体而言,它们假设在正样本对中,所有潜在因子要么以相似程度变化,要么某些因子完全不变化。然而,在实践中,正样本对通常通过诸如强裁剪至仅几个像素等增强方式生成。因此,一个更现实的假设是,所有潜在因子都会发生变化,且这些因子之间存在连续的变化性。我们提出了AnInfoNCE,这是InfoNCE的一种泛化形式,能够在各向异性设置下可证明地揭示潜在因子,从而广泛地推广了先前对比学习中的可识别性结果。我们在受控实验中验证了我们的可识别性结果,并表明AnInfoNCE在CIFAR10和ImageNet上增加了先前坍缩信息的恢复,尽管这是以下游准确率为代价的。此外,我们探讨并讨论了理论假设与实际实现之间的进一步不匹配,包括对困难负样本挖掘和损失函数集成的扩展。