As one of the most promising methods in self-supervised learning, contrastive learning has achieved a series of breakthroughs across numerous fields. A predominant approach to implementing contrastive learning is applying InfoNCE loss: By capturing the similarities between pairs, InfoNCE loss enables learning the representation of data. Albeit its success, adopting InfoNCE loss requires tuning a temperature, which is a core hyperparameter for calibrating similarity scores. Despite its significance and sensitivity to performance being emphasized by several studies, searching for a valid temperature requires extensive trial-and-error-based experiments, which increases the difficulty of adopting InfoNCE loss. To address this difficulty, we propose a novel method to deploy InfoNCE loss without temperature. Specifically, we replace temperature scaling with the inverse hyperbolic tangent function, resulting in a modified InfoNCE loss. In addition to hyperparameter-free deployment, we observed that the proposed method even yielded a performance gain in contrastive learning. Our detailed theoretical analysis discovers that the current practice of temperature scaling in InfoNCE loss causes serious problems in gradient descent, whereas our method provides desirable gradient properties. The proposed method was validated on five benchmarks on contrastive learning, yielding satisfactory results without temperature tuning.
翻译:作为自监督学习中最具前景的方法之一,对比学习已在众多领域取得了一系列突破。实现对比学习的主流方法是应用InfoNCE损失函数:通过捕捉样本对之间的相似性,InfoNCE损失能够学习数据的表征。尽管该损失函数取得了成功,但其应用需要调节温度参数——这是一个用于校准相似性分数的核心超参数。尽管多项研究强调了该参数的重要性及其对性能的敏感性,但寻找有效温度值需要大量基于试错的实验,这增加了InfoNCE损失的采用难度。为解决此难题,我们提出了一种无需温度参数即可部署InfoNCE损失的新方法。具体而言,我们用反双曲正切函数替代温度缩放机制,从而得到改进的InfoNCE损失。除了实现免超参数部署外,我们观察到所提方法甚至在对比学习中带来了性能提升。详细的理论分析表明,当前InfoNCE损失中的温度缩放实践会导致梯度下降过程中的严重问题,而我们的方法提供了理想的梯度特性。所提方法在对比学习的五个基准测试中得到验证,在无需温度调节的情况下均取得了令人满意的结果。