In this paper, we propose a novel probabilistic self-supervised learning via Scoring Rule Minimization (ProSMIN), which leverages the power of probabilistic models to enhance representation quality and mitigate collapsing representations. Our proposed approach involves two neural networks; the online network and the target network, which collaborate and learn the diverse distribution of representations from each other through knowledge distillation. By presenting the input samples in two augmented formats, the online network is trained to predict the target network representation of the same sample under a different augmented view. The two networks are trained via our new loss function based on proper scoring rules. We provide a theoretical justification for ProSMIN's convergence, demonstrating the strict propriety of its modified scoring rule. This insight validates the method's optimization process and contributes to its robustness and effectiveness in improving representation quality. We evaluate our probabilistic model on various downstream tasks, such as in-distribution generalization, out-of-distribution detection, dataset corruption, low-shot learning, and transfer learning. Our method achieves superior accuracy and calibration, surpassing the self-supervised baseline in a wide range of experiments on large-scale datasets like ImageNet-O and ImageNet-C, ProSMIN demonstrates its scalability and real-world applicability.
翻译:本文提出了一种基于评分规则最小化的新型概率自监督学习方法(ProSMIN),该方法利用概率模型的优势来提升表示质量并缓解表示坍塌问题。我们的方法包含两个神经网络:在线网络和目标网络,它们通过知识蒸馏相互协作并学习彼此的分布多样性。通过以两种增强形式呈现输入样本,在线网络被训练用于预测同一样本在另一种增强视图下的目标网络表示。这两个网络通过基于恰当评分规则的新损失函数进行训练。我们为ProSMIN的收敛性提供了理论证明,展示了其修改后评分规则的严格恰当性。这一见解验证了该方法的优化过程,并有助于其在提升表示质量方面的鲁棒性和有效性。我们在各种下游任务(如分布内泛化、分布外检测、数据集损坏、小样本学习和迁移学习)上评估了我们的概率模型。在ImageNet-O和ImageNet-C等大规模数据集上的广泛实验中,我们的方法在准确率和校准性能方面均超越了自监督基线,证明了ProSMIN的可扩展性和实际应用价值。