Self-supervised learning (SSL) is a powerful technique for learning from unlabeled data. By learning to remain invariant to applied data augmentations, methods such as SimCLR and MoCo can reach quality on par with supervised approaches. However, this invariance may be detrimental for solving downstream tasks that depend on traits affected by augmentations used during pretraining, such as color. In this paper, we propose to foster sensitivity to such characteristics in the representation space by modifying the projector network, a common component of self-supervised architectures. Specifically, we supplement the projector with information about augmentations applied to images. For the projector to take advantage of this auxiliary conditioning when solving the SSL task, the feature extractor learns to preserve the augmentation information in its representations. Our approach, coined Conditional Augmentation-aware Self-supervised Learning (CASSLE), is directly applicable to typical joint-embedding SSL methods regardless of their objective functions. Moreover, it does not require major changes in the network architecture or prior knowledge of downstream tasks. In addition to an analysis of sensitivity towards different data augmentations, we conduct a series of experiments, which show that CASSLE improves over various SSL methods, reaching state-of-the-art performance in multiple downstream tasks.
翻译:自监督学习(SSL)是一种从无标注数据中学习的强大技术。通过学习对施加的数据增强保持不变性,SimCLR和MoCo等方法可以达到与监督方法相当的质量。然而,这种不变性可能不利于解决那些依赖于预训练期间所用增强所影响特征(例如颜色)的下游任务。在本文中,我们提出通过修改投影器网络(自监督架构的常见组件)来增强表示空间对此类特征的敏感性。具体而言,我们向投影器补充了应用于图像的增强信息。为了使投影器在解决SSL任务时能够利用这种辅助条件,特征提取器需要学习在其表示中保留增强信息。我们提出的方法称为条件增强感知自监督学习(CASSLE),可直接适用于典型的联合嵌入SSL方法,且不受其目标函数限制。此外,该方法不需要对网络架构进行重大修改,也无需预知下游任务。除了对不同数据增强的敏感性分析外,我们还进行了一系列实验,结果表明CASSLE在多种SSL方法上均有所改进,在多个下游任务中达到了最先进的性能。