Recent success in contrastive learning has sparked growing interest in more effectively leveraging multiple augmented views of data. While prior methods incorporate multiple views at the loss or feature level, they primarily capture pairwise relationships and fail to model the joint structure across all views. In this work, we propose a divergence-based similarity function (DSF) that explicitly captures the joint structure by representing each set of augmented views as a distribution and measuring similarity as the divergence between distributions. Extensive experiments demonstrate that DSF consistently improves performance across diverse tasks, including kNN classification, linear evaluation, transfer learning, and distribution shift, while also achieving greater efficiency than other multi-view methods. Furthermore, we establish a connection between DSF and cosine similarity, and demonstrate that, unlike cosine similarity, DSF operates effectively without the need for tuning a temperature hyperparameter.
翻译:对比学习的最新成功引发了人们对更有效利用数据的多个增强视图日益增长的兴趣。虽然先前的方法在损失或特征层面融入了多视图,但它们主要捕获成对关系,未能建模所有视图之间的联合结构。在本工作中,我们提出了一种基于散度的相似性函数,该方法通过将每组增强视图表示为一个分布,并以分布间的散度度量相似性,从而显式地捕获联合结构。大量实验表明,DSF 在包括 kNN 分类、线性评估、迁移学习和分布偏移在内的多种任务中持续提升性能,同时相较于其他多视图方法实现了更高的效率。此外,我们建立了 DSF 与余弦相似性之间的联系,并证明与余弦相似性不同,DSF 无需调整温度超参数即可有效运行。