Self-supervised learning is a popular and powerful method for utilizing large amounts of unlabeled data, for which a wide variety of training objectives have been proposed in the literature. In this study, we perform a Bayesian analysis of state-of-the-art self-supervised learning objectives and propose a unified formulation based on likelihood learning. Our analysis suggests a simple method for integrating self-supervised learning with generative models, allowing for the joint training of these two seemingly distinct approaches. We refer to this combined framework as GEDI, which stands for GEnerative and DIscriminative training. Additionally, we demonstrate an instantiation of the GEDI framework by integrating an energy-based model with a cluster-based self-supervised learning model. Through experiments on synthetic and real-world data, including SVHN, CIFAR10, and CIFAR100, we show that GEDI outperforms existing self-supervised learning strategies in terms of clustering performance by a wide margin. We also demonstrate that GEDI can be integrated into a neural-symbolic framework to address tasks in the small data regime, where it can use logical constraints to further improve clustering and classification performance.
翻译:自监督学习是一种利用大量无标签数据的流行且强大的方法,文献中已提出多种多样的训练目标。在本研究中,我们对当前最先进的自监督学习目标进行了贝叶斯分析,并提出了基于似然学习的统一框架。我们的分析表明,通过一种简单方法可将自监督学习与生成模型相结合,实现这两种看似不同方法的联合训练。我们将这一联合框架称为GEDI(GEnerative and DIscriminative training)。此外,我们通过将基于能量的模型与基于聚类的自监督学习模型相结合,展示了GEDI框架的一个具体实例。在SVHN、CIFAR10和CIFAR100等合成及真实世界数据集上的实验表明,GEDI在聚类性能上以显著优势超越了现有自监督学习策略。我们还证明,GEDI可被集成到神经符号框架中,以应对小数据场景下的任务;在该场景下,其可通过逻辑约束进一步提高聚类与分类性能。