This study reveals the inherent tolerance of contrastive learning (CL) towards sampling bias, wherein negative samples may encompass similar semantics (\eg labels). However, existing theories fall short in providing explanations for this phenomenon. We bridge this research gap by analyzing CL through the lens of distributionally robust optimization (DRO), yielding several key insights: (1) CL essentially conducts DRO over the negative sampling distribution, thus enabling robust performance across a variety of potential distributions and demonstrating robustness to sampling bias; (2) The design of the temperature $\tau$ is not merely heuristic but acts as a Lagrange Coefficient, regulating the size of the potential distribution set; (3) A theoretical connection is established between DRO and mutual information, thus presenting fresh evidence for ``InfoNCE as an estimate of MI'' and a new estimation approach for $\phi$-divergence-based generalized mutual information. We also identify CL's potential shortcomings, including over-conservatism and sensitivity to outliers, and introduce a novel Adjusted InfoNCE loss (ADNCE) to mitigate these issues. It refines potential distribution, improving performance and accelerating convergence. Extensive experiments on various domains (image, sentence, and graphs) validate the effectiveness of the proposal. The code is available at \url{https://github.com/junkangwu/ADNCE}.
翻译:本研究揭示了对比学习(CL)对采样偏差的内在容忍性,即负样本可能包含相似语义(如标签)。然而,现有理论尚无法解释这一现象。我们通过分布鲁棒优化(DRO)的视角分析CL,填补了这一研究空白,并获得若干关键见解:(1)CL本质上对负采样分布进行DRO,从而在多种潜在分布下实现稳健性能,并展现出对采样偏差的鲁棒性;(2)温度参数$\tau$的设计并非启发式,而是作为拉格朗日系数,调节潜在分布集的大小;(3)建立了DRO与互信息之间的理论联系,为“InfoNCE作为互信息估计”提供了新证据,并提出了一种基于$\phi$-散度的广义互信息新估计方法。我们还识别了CL的潜在缺陷,包括过度保守性和对异常值的敏感性,并引入了一种新颖的调整型InfoNCE损失(ADNCE)以缓解这些问题。该方法优化了潜在分布,提升了性能并加速了收敛。在图像、句子和图等多个领域的广泛实验验证了该方法的有效性。代码已开源在\url{https://github.com/junkangwu/ADNCE}。