In recent years, discriminative self-supervised methods have made significant strides in advancing various visual tasks. The central idea of learning a data encoder that is robust to data distortions/augmentations is straightforward yet highly effective. Although many studies have demonstrated the empirical success of various learning methods, the resulting learned representations can exhibit instability and hinder downstream performance. In this study, we analyze discriminative self-supervised methods from a causal perspective to explain these unstable behaviors and propose solutions to overcome them. Our approach draws inspiration from prior works that empirically demonstrate the ability of discriminative self-supervised methods to demix ground truth causal sources to some extent. Unlike previous work on causality-empowered representation learning, we do not apply our solutions during the training process but rather during the inference process to improve time efficiency. Through experiments on both controlled image datasets and realistic image datasets, we show that our proposed solutions, which involve tempering a linear transformation with controlled synthetic data, are effective in addressing these issues.
翻译:近年来,判别式自监督方法在推进各类视觉任务方面取得了显著进展。其核心理念——学习对数据失真/增强具有鲁棒性的数据编码器——虽然直接,却极为高效。尽管众多研究已从实证角度验证了各类学习方法的有效性,但由此习得的表征可能表现出不稳定性,进而阻碍下游任务的性能。在本研究中,我们从因果视角分析判别式自监督方法,以解释这些不稳定行为,并提出解决方案加以克服。我们的方法借鉴了先前研究的成果——这些研究从实证角度证明了判别式自监督方法在一定程度上具备分离真实因果源的能力。与先前关于因果增强型表征学习的研究不同,我们并非在训练过程中应用解决方案,而是在推理过程中实施,以提升时间效率。通过在受控图像数据集和真实图像数据集上的实验,我们证明所提出的方案——利用受控合成数据对线性变换进行缩放调节——能有效解决上述问题。