Self-supervised learning relies heavily on data augmentation to extract meaningful representations from unlabeled images. While existing state-of-the-art augmentation pipelines incorporate a wide range of primitive transformations, these often disregard natural image structure. Thus, augmented samples can exhibit degraded semantic information and low stylistic diversity, affecting downstream performance of self-supervised representations. To overcome this, we propose SASSL: Style Augmentations for Self Supervised Learning, a novel augmentation technique based on Neural Style Transfer. The method decouples semantic and stylistic attributes in images and applies transformations exclusively to the style while preserving content, generating diverse augmented samples that better retain their semantic properties. Experimental results show our technique achieves a top-1 classification performance improvement of more than 2% on ImageNet compared to the well-established MoCo v2. We also measure transfer learning performance across five diverse datasets, observing significant improvements of up to 3.75%. Our experiments indicate that decoupling style from content information and transferring style across datasets to diversify augmentations can significantly improve downstream performance of self-supervised representations.
翻译:自监督学习严重依赖数据增强来从无标签图像中提取有意义的表征。虽然现有最先进的增强流程包含多种基础变换,但这些变换常忽略自然图像的结构。因此,增强样本可能出现语义信息退化与风格多样性不足的问题,影响自监督表征的下游性能。为解决该问题,我们提出SASSL:面向自监督学习的风格增强技术——一种基于神经风格迁移的新型增强方法。该方法将图像中的语义与风格属性解耦,在保留内容的前提下仅对风格进行变换,生成更具多样性且更好维持语义属性的增强样本。实验表明,与成熟的MoCo v2相比,本技术在ImageNet上的top-1分类性能提升超过2%。我们在五个不同数据集上测量迁移学习性能,观察到高达3.75%的显著改进。实验证明,将风格与内容信息解耦,并在数据集间迁移风格以增加增强多样性,可显著提升自监督表征的下游性能。