Self-supervised Learning (SSL) including the mainstream contrastive learning has achieved great success in learning visual representations without data annotations. However, most methods mainly focus on the instance level information (\ie, the different augmented images of the same instance should have the same feature or cluster into the same class), but there is a lack of attention on the relationships between different instances. In this paper, we introduce a novel SSL paradigm, which we term as relational self-supervised learning (ReSSL) framework that learns representations by modeling the relationship between different instances. Specifically, our proposed method employs sharpened distribution of pairwise similarities among different instances as \textit{relation} metric, which is thus utilized to match the feature embeddings of different augmentations. To boost the performance, we argue that weak augmentations matter to represent a more reliable relation, and leverage momentum strategy for practical efficiency. The designed asymmetric predictor head and an InfoNCE warm-up strategy enhance the robustness to hyper-parameters and benefit the resulting performance. Experimental results show that our proposed ReSSL substantially outperforms the state-of-the-art methods across different network architectures, including various lightweight networks (\eg, EfficientNet and MobileNet).
翻译:自监督学习(包括主流对比学习)在不依赖数据标注的情况下,已在视觉表征学习领域取得巨大成功。然而,现有方法主要关注实例级信息(即同一实例的不同增强图像应具有相同特征或聚类至相同类别),但对不同实例间关系缺乏重视。本文提出一种新型自监督学习范式——关系型自监督学习(ReSSL)框架,通过建模不同实例间的关系来学习表征。具体而言,我们使用不同实例间成对相似性的锐化分布作为“关系”度量,并据此匹配不同增强方式的特征嵌入。为提升性能,我们认为弱增强对于构建更可靠的关系至关重要,并采用动量策略以提高实际效率。所设计的非对称预测头与InfoNCE预热策略增强了超参数鲁棒性,并提升了最终性能。实验结果表明,我们提出的ReSSL在不同网络架构(包括EfficientNet和MobileNet等各类轻量级网络)上均显著优于现有最先进方法。