Self-supervised learning is well known for its remarkable performance in representation learning and various downstream computer vision tasks. Recently, Positive-pair-Only Contrastive Learning (POCL) has achieved reliable performance without the need to construct positive-negative training sets. It reduces memory requirements by lessening the dependency on the batch size. The POCL method typically uses a single loss function to extract the distortion invariant representation (DIR) which describes the proximity of positive-pair representations affected by different distortions. This loss function implicitly enables the model to filter out or ignore the distortion variant representation (DVR) affected by different distortions. However, existing POCL methods do not explicitly enforce the disentanglement and exploitation of the actually valuable DVR. In addition, these POCL methods have been observed to be sensitive to augmentation strategies. To address these limitations, we propose a novel POCL framework named Distortion-Disentangled Contrastive Learning (DDCL) and a Distortion-Disentangled Loss (DDL). Our approach is the first to explicitly disentangle and exploit the DVR inside the model and feature stream to improve the overall representation utilization efficiency, robustness and representation ability. Experiments carried out demonstrate the superiority of our framework to Barlow Twins and Simsiam in terms of convergence, representation quality, and robustness on several benchmark datasets.
翻译:自监督学习以其在表示学习和各种下游计算机视觉任务中的卓越表现而闻名。近年来,仅正样本对对比学习(POCL)无需构建正负训练集即可实现可靠性能,通过减少对批尺寸的依赖降低了内存需求。POCL方法通常使用单一损失函数提取畸变不变表示(DIR),该表示描述了受不同畸变影响正样本对表示之间的接近程度。该损失函数隐式使模型能够过滤或忽略受不同畸变影响的畸变变体表示(DVR)。然而,现有POCL方法并未显式约束对实际有价值的DVR进行解耦和利用。此外,这些POCL方法被观察到对增强策略敏感。为解决这些局限,我们提出一种名为畸变解耦对比学习(DDCL)的新型POCL框架和畸变解耦损失(DDL)。我们的方法首次在模型和特征流中显式解耦并利用DVR,以提升整体表示利用效率、鲁棒性和表示能力。实验表明,在多个基准数据集上,我们的框架在收敛性、表示质量和鲁棒性方面优于Barlow Twins和Simsiam。