Self-supervised contrastive learning (CL) has achieved state-of-the-art performance in representation learning by minimizing the distance between positive pairs while maximizing that of negative ones. Recently, it has been verified that the model learns better representation with diversely augmented positive pairs because they enable the model to be more view-invariant. However, only a few studies on CL have considered the difference between augmented views, and have not gone beyond the hand-crafted findings. In this paper, we first observe that the score-matching function can measure how much data has changed from the original through augmentation. With the observed property, every pair in CL can be weighted adaptively by the difference of score values, resulting in boosting the performance of the existing CL method. We show the generality of our method, referred to as ScoreCL, by consistently improving various CL methods, SimCLR, SimSiam, W-MSE, and VICReg, up to 3%p in k-NN evaluation on CIFAR-10, CIFAR-100, and ImageNet-100. Moreover, we have conducted exhaustive experiments and ablations, including results on diverse downstream tasks, comparison with possible baselines, and improvement when used with other proposed augmentation methods. We hope our exploration will inspire more research in exploiting the score matching for CL.
翻译:自监督对比学习(CL)通过最小化正样本对之间的距离并最大化负样本对之间的距离,在表征学习领域取得了最先进的性能。近期研究证实,采用多样化的增强正样本对能使模型更具视角不变性,从而学习到更优的表征。然而,目前仅有少数CL研究考虑了增强视图间的差异,且尚未突破人工设定的经验发现。本文首次观察到分数匹配函数能够度量数据经增强后相对于原始数据的变化程度。基于这一特性,CL中的每个样本对可根据分数值的差异进行自适应加权,从而提升现有CL方法的性能。我们提出的ScoreCL方法通过持续改进多种CL方法(包括SimCLR、SimSiam、W-MSE和VICReg),在CIFAR-10、CIFAR-100和ImageNet-100数据集上的k-NN评估中均获得高达3%的性能提升,验证了其通用性。此外,我们开展了详尽的实验与消融研究,涵盖多种下游任务的结果、与可能基线方法的比较,以及与其他增强方法结合使用的改进效果。期望我们的探索能激发更多关于将分数匹配应用于CL的研究。