Self-supervised learning aims to extract meaningful features from unlabeled data for further downstream tasks. In this paper, we consider classification as a downstream task in phase 2 and develop rigorous theories to realize the factors that implicitly influence the general loss of this classification task. Our theories signify that sharpness-aware feature extractors benefit the classification task in phase 2 and the existing data shift between the ideal (i.e., the ideal one used in theory development) and practical (i.e., the practical one used in implementation) distributions to generate positive pairs also remarkably affects this classification task. Further harvesting these theoretical findings, we propose to minimize the sharpness of the feature extractor and a new Fourier-based data augmentation technique to relieve the data shift in the distributions generating positive pairs, reaching Sharpness & Shift-Aware Contrastive Learning (SSA-CLR). We conduct extensive experiments to verify our theoretical findings and demonstrate that sharpness & shift-aware contrastive learning can remarkably boost the performance as well as obtaining more robust extracted features compared with the baselines.
翻译:自监督学习旨在从无标注数据中提取有意义的特征,以服务于后续的下游任务。本文考虑第二阶段分类作为下游任务,并发展严格的理论,揭示影响该分类任务泛化损失的隐含因素。我们的理论表明:锐度感知的特征提取器有利于第二阶段的分类任务,而理想分布(即理论推导中使用的理想分布)与实际分布(即实现中使用的实际分布)之间用于生成正样本对的数据偏移,亦显著影响该分类任务。基于这些理论发现,我们提出最小化特征提取器的锐度,并引入一种基于傅里叶变换的新型数据增强技术,以缓解生成正样本对的数据分布偏移,从而提出锐度与偏移感知对比学习(Sharpness & Shift-Aware Contrastive Learning, SSA-CLR)。我们通过大量实验验证理论发现,并证明相对于基线方法,锐度与偏移感知的对比学习能够显著提升性能,同时获得更鲁棒的提取特征。