In contemporary self-supervised contrastive algorithms like SimCLR, MoCo, etc., the task of balancing attraction between two semantically similar samples and repulsion between two samples from different classes is primarily affected by the presence of hard negative samples. While the InfoNCE loss has been shown to impose penalties based on hardness, the temperature hyper-parameter is the key to regulating the penalties and the trade-off between uniformity and tolerance. In this work, we focus our attention to improve the performance of InfoNCE loss in SSL by studying the effect of temperature hyper-parameter values. We propose a cosine similarity-dependent temperature scaling function to effectively optimize the distribution of the samples in the feature space. We further analyze the uniformity and tolerance metrics to investigate the optimal regions in the cosine similarity space for better optimization. Additionally, we offer a comprehensive examination of the behavior of local and global structures in the feature space throughout the pre-training phase, as the temperature varies. Experimental evidence shows that the proposed framework outperforms or is at par with the contrastive loss-based SSL algorithms. We believe our work (DySTreSS) on temperature scaling in SSL provides a foundation for future research in contrastive learning.
翻译:在当代自监督对比学习算法(如SimCLR、MoCo等)中,平衡两个语义相似样本间的吸引与不同类别样本间的排斥这一任务主要受难负样本的影响。尽管InfoNCE损失已被证明会根据样本困难程度施加惩罚,但温度超参数仍是调控惩罚力度以及均匀性与容忍度之间权衡的关键。本文聚焦于通过研究温度超参数值的影响来提升自监督学习中InfoNCE损失的性能。我们提出了一种基于余弦相似度的温度缩放函数,以有效优化特征空间中样本的分布。进一步地,我们分析了均匀性与容忍度指标,探究余弦相似度空间中更有利于优化的区域。此外,我们全面考察了预训练阶段随温度变化时特征空间中局部与全局结构的行为特征。实验结果表明,所提出的框架优于或持平于基于对比损失的自监督学习算法。我们相信,本文关于自监督学习中温度缩放的研究(DySTreSS)为后续对比学习研究奠定了基础。