Contrastive Learning (CL) has achieved impressive performance in self-supervised learning tasks, showing superior generalization ability. Inspired by the success, adopting CL into collaborative filtering (CF) is prevailing in semi-supervised top-K recommendations. The basic idea is to routinely conduct heuristic-based data augmentation and apply contrastive losses (e.g., InfoNCE) on the augmented views. Yet, some CF-tailored challenges make this adoption suboptimal, such as the issue of out-of-distribution, the risk of false negatives, and the nature of top-K evaluation. They necessitate the CL-based CF scheme to focus more on mining hard negatives and distinguishing false negatives from the vast unlabeled user-item interactions, for informative contrast signals. Worse still, there is limited understanding of contrastive loss in CF methods, especially w.r.t. its generalization ability. To bridge the gap, we delve into the reasons underpinning the success of contrastive loss in CF, and propose a principled Adversarial InfoNCE loss (AdvInfoNCE), which is a variant of InfoNCE, specially tailored for CF methods. AdvInfoNCE adaptively explores and assigns hardness to each negative instance in an adversarial fashion and further utilizes a fine-grained hardness-aware ranking criterion to empower the recommender's generalization ability. Training CF models with AdvInfoNCE, we validate the effectiveness of AdvInfoNCE on both synthetic and real-world benchmark datasets, thus showing its generalization ability to mitigate out-of-distribution problems. Given the theoretical guarantees and empirical superiority of AdvInfoNCE over most contrastive loss functions, we advocate its adoption as a standard loss in recommender systems, particularly for the out-of-distribution tasks. Codes are available at https://github.com/LehengTHU/AdvInfoNCE.
翻译:对比学习(CL)在自监督学习任务中展现出卓越的泛化性能。受此启发,将CL引入协同过滤(CF)在半监督Top-K推荐中日益流行。其基本思路是常规性地进行启发式数据增强,并在增强视图上应用对比损失(如InfoNCE)。然而,一些CF特有的挑战使得这种应用并非最优,例如分布外问题、假负例风险以及Top-K评估的特性。这些挑战要求基于CL的CF方案更专注于挖掘难负例,并从海量未标记的用户-物品交互中区分假负例,以获取信息量丰富的对比信号。更关键的是,目前对CF方法中对比损失的理解有限,尤其是其泛化能力。为弥补这一差距,我们深入探究对比损失在CF中成功的原因,并提出一种原则性的对抗性InfoNCE损失(AdvInfoNCE),作为InfoNCE的变体,专为CF方法定制。AdvInfoNCE以对抗方式自适应地探索并赋予每个负例难度,并进一步利用细粒度的难度感知排序标准来增强推荐器的泛化能力。通过使用AdvInfoNCE训练CF模型,我们在合成和真实基准数据集上验证了其有效性,从而展示了其在缓解分布外问题方面的泛化能力。鉴于AdvInfoNCE在理论保证和实证上优于大多数对比损失函数,我们提倡将其作为推荐系统中的标准损失,特别是针对分布外任务。代码可在https://github.com/LehengTHU/AdvInfoNCE获取。