The standard approach to modern self-supervised learning is to generate random views through data augmentations and minimise a loss computed from the representations of these views. This inherently encourages invariance to the transformations that comprise the data augmentation function. In this work, we show that adding a module to constrain the representations to be predictive of an affine transformation improves the performance and efficiency of the learning process. The module is agnostic to the base self-supervised model and manifests in the form of an additional loss term that encourages an aggregation of the encoder representations to be predictive of an affine transformation applied to the input images. We perform experiments in various modern self-supervised models and see a performance improvement in all cases. Further, we perform an ablation study on the components of the affine transformation to understand which of them is affecting performance the most, as well as on key architectural design decisions.
翻译:现代自监督学习的标准方法是通过数据增强生成随机视图,并最小化这些视图表示所对应的损失函数,这本质上促使模型对构成数据增强函数的变换具有不变性。本文表明,添加一个约束模块以使表示能够预测仿射变换,能够提升学习过程的性能与效率。该模块独立于基础自监督模型,通过附加损失项的形式实现,鼓励编码器表示的聚合结果能够预测输入图像所施加的仿射变换。我们在多种现代自监督模型上进行实验,所有情况下均观察到性能提升。此外,我们通过消融实验研究仿射变换各组成部分对性能的影响程度,并分析了关键架构设计决策。