Existing data augmentation in self-supervised learning, while diverse, fails to preserve the inherent structure of natural images. This results in distorted augmented samples with compromised semantic information, ultimately impacting downstream performance. To overcome this limitation, we propose SASSL: Style Augmentations for Self Supervised Learning, a novel data augmentation technique based on Neural Style Transfer. SASSL decouples semantic and stylistic attributes in images and applies transformations exclusively to their style while preserving content, generating diverse samples that better retain semantic information. SASSL boosts top-1 image classification accuracy on ImageNet by up to 2 percentage points compared to established self-supervised methods like MoCo, SimCLR, and BYOL, while achieving superior transfer learning performance across various datasets. Because SASSL can be performed asynchronously as part of the data augmentation pipeline, these performance impacts can be obtained with no change in pretraining throughput.
翻译:现有的自监督学习数据增强方法虽然多样,但未能保持自然图像的内在结构。这导致增强样本失真且语义信息受损,最终影响下游任务性能。为克服这一局限性,我们提出SASSL:面向自监督学习的风格增强技术,这是一种基于神经风格迁移的新型数据增强方法。SASSL将图像中的语义属性与风格特征解耦,在保持内容不变的前提下仅对风格进行变换,从而生成能更好保留语义信息的多样化样本。相较于MoCo、SimCLR和BYOL等成熟的自监督学习方法,SASSL将ImageNet图像分类任务的Top-1准确率最高提升2个百分点,并在多个数据集上实现了更优的迁移学习性能。由于SASSL可作为数据增强流程中的异步环节执行,这些性能提升可在不改变预训练吞吐量的前提下实现。