Contrastive learning has achieved state-of-the-art performance in various self-supervised learning tasks and even outperforms its supervised counterpart. Despite its empirical success, theoretical understanding of the superiority of contrastive learning is still limited. In this paper, under linear representation settings, (i) we provably show that contrastive learning outperforms the standard autoencoders and generative adversarial networks, two classical generative unsupervised learning methods, for both feature recovery and in-domain downstream tasks; (ii) we also illustrate the impact of labeled data in supervised contrastive learning. This provides theoretical support for recent findings that contrastive learning with labels improves the performance of learned representations in the in-domain downstream task, but it can harm the performance in transfer learning. We verify our theory with numerical experiments.
翻译:对比学习在各种自监督学习任务中取得了最先进的性能,甚至超越了其监督学习对应方法。尽管其实证成功显著,但关于对比学习优越性的理论理解仍然有限。本文在线性表示设定下,(i) 我们严格证明了在特征恢复及域内下游任务中,对比学习优于标准自编码器和生成对抗网络这两种经典生成式无监督学习方法;(ii) 我们还阐释了带标签数据在监督对比学习中的影响。这为近期发现提供了理论支持——即带标签的对比学习虽能提升域内下游任务中学习表示的效能,却可能损害迁移学习性能。我们通过数值实验验证了该理论。