Self-supervised learning is well known for its remarkable performance in representation learning and various downstream computer vision tasks. Recently, Positive-pair-Only Contrastive Learning (POCL) has achieved reliable performance without the need to construct positive-negative training sets. It reduces memory requirements by lessening the dependency on the batch size. The POCL method typically uses a single loss function to extract the distortion invariant representation (DIR) which describes the proximity of positive-pair representations affected by different distortions. This loss function implicitly enables the model to filter out or ignore the distortion variant representation (DVR) affected by different distortions. However, existing POCL methods do not explicitly enforce the disentanglement and exploitation of the actually valuable DVR. In addition, these POCL methods have been observed to be sensitive to augmentation strategies. To address these limitations, we propose a novel POCL framework named Distortion-Disentangled Contrastive Learning (DDCL) and a Distortion-Disentangled Loss (DDL). Our approach is the first to explicitly disentangle and exploit the DVR inside the model and feature stream to improve the overall representation utilization efficiency, robustness and representation ability. Experiments carried out demonstrate the superiority of our framework to Barlow Twins and Simsiam in terms of convergence, representation quality, and robustness on several benchmark datasets.
翻译:自监督学习以其在表示学习和各类下游计算机视觉任务中的卓越表现而广为人知。近年来,仅正样本对对比学习(POCL)无需构建正负样本训练集即可实现可靠性能,并通过减少对批大小的依赖降低了内存需求。POCL方法通常采用单一损失函数来提取失真不变表示(DIR),该表示描述了受不同失真影响的正样本对表示之间的相似性。这种损失函数隐式地使模型能够过滤或忽略受不同失真影响的失真变量表示(DVR)。然而,现有POCL方法并未明确强制解耦和利用实际有价值的DVR。此外,这些POCL方法对数据增强策略的敏感性已被观察到。为解决这些局限性,我们提出了一种名为"失真解耦对比学习"(DDCL)的新型POCL框架和一种"失真解耦损失"(DDL)。我们的方法首次在模型和特征流中显式解耦并利用DVR,从而提升整体表示利用效率、鲁棒性和表示能力。实验表明,在多个基准数据集上,我们的框架在收敛性、表示质量和鲁棒性方面均优于Barlow Twins和SimSiam。