Self-supervised visual representation methods are closing the gap with supervised learning performance. These methods rely on maximizing the similarity between embeddings of related synthetic inputs created through data augmentations. This can be seen as a task that encourages embeddings to leave out factors modified by these augmentations, i.e. to be invariant to them. However, this only considers one side of the trade-off in the choice of the augmentations: they need to strongly modify the images to avoid simple solution shortcut learning (e.g. using only color histograms), but on the other hand, augmentations-related information may be lacking in the representations for some downstream tasks (e.g. color is important for birds and flower classification). Few recent works proposed to mitigate the problem of using only an invariance task by exploring some form of equivariance to augmentations. This has been performed by learning additional embeddings space(s), where some augmentation(s) cause embeddings to differ, yet in a non-controlled way. In this work, we introduce EquiMod a generic equivariance module that structures the learned latent space, in the sense that our module learns to predict the displacement in the embedding space caused by the augmentations. We show that applying that module to state-of-the-art invariance models, such as SimCLR and BYOL, increases the performances on CIFAR10 and ImageNet datasets. Moreover, while our model could collapse to a trivial equivariance, i.e. invariance, we observe that it instead automatically learns to keep some augmentations-related information beneficial to the representations.
翻译:自监督视觉表征方法正逐渐缩小与监督学习性能的差距。这些方法依赖于最大化通过数据增强生成的关联合成输入嵌入之间的相似性。这可以视为一种任务,鼓励嵌入排除由这些增强修改的因素,即对其具有不变性。然而,这仅考虑了增强选择中权衡的一面:增强需要强烈地修改图像以避免简单的捷径学习(例如仅使用颜色直方图),但另一方面,增强相关信息可能在某些下游任务(如鸟类和花卉分类中颜色至关重要)中缺乏表征。近期少数研究通过探索增强的某种等变性来缓解仅使用不变性任务的问题。这通常通过学习额外的嵌入空间实现,其中某些增强导致嵌入产生差异,但以非受控方式。在本工作中,我们提出EquiMod——一种通用等变模块,它通过让模块学习预测由增强引起的嵌入空间位移,从而结构性地组织潜在空间。我们证明,将该模块应用于SimCLR和BYOL等最先进的不变性模型,可提升其在CIFAR10和ImageNet数据集上的性能。此外,尽管模型可能退化为平凡等变性(即不变性),但我们观察到它反而自动学会保留部分对表征有益的增强相关信息。