Self-supervised learning relies heavily on data augmentation to extract meaningful representations from unlabeled images. While existing state-of-the-art augmentation pipelines incorporate a wide range of primitive transformations, these often disregard natural image structure. Thus, augmented samples can exhibit degraded semantic information and low stylistic diversity, affecting downstream performance of self-supervised representations. To overcome this, we propose SASSL: Style Augmentations for Self Supervised Learning, a novel augmentation technique based on Neural Style Transfer. The method decouples semantic and stylistic attributes in images and applies transformations exclusively to the style while preserving content, generating diverse augmented samples that better retain their semantic properties. Experimental results show our technique achieves a top-1 classification performance improvement of more than 2% on ImageNet compared to the well-established MoCo v2. We also measure transfer learning performance across five diverse datasets, observing significant improvements of up to 3.75%. Our experiments indicate that decoupling style from content information and transferring style across datasets to diversify augmentations can significantly improve downstream performance of self-supervised representations.
翻译:自监督学习高度依赖数据增强从无标签图像中提取有意义的表征。现有最先进的增强流程虽包含多种基础变换,但常忽视自然图像的结构特性,导致增强样本出现语义信息退化及风格多样性不足的问题,进而影响自监督表征的下游性能。为解决此问题,我们提出基于神经风格迁移的新型增强技术SASSL(自监督学习风格增强)。该方法将图像中的语义与风格属性解耦,在保留内容的同时仅对风格施加变换,生成保持语义特性的多样化增强样本。实验表明,该技术在ImageNet上相较成熟的MoCo v2实现了超过2%的top-1分类性能提升。此外,在五个不同数据集上对迁移学习性能进行评估,观察到最高达3.75%的显著改进。实验证明,将风格与内容信息解耦并在跨数据集间迁移风格以增强样本多样性,可有效提升自监督表征的下游性能。