In semi-supervised learning, the paradigm of self-training refers to the idea of learning from pseudo-labels suggested by the learner itself. Across various domains, corresponding methods have proven effective and achieve state-of-the-art performance. However, pseudo-labels typically stem from ad-hoc heuristics, relying on the quality of the predictions though without guaranteeing their validity. One such method, so-called credal self-supervised learning, maintains pseudo-supervision in the form of sets of (instead of single) probability distributions over labels, thereby allowing for a flexible yet uncertainty-aware labeling. Again, however, there is no justification beyond empirical effectiveness. To address this deficiency, we make use of conformal prediction, an approach that comes with guarantees on the validity of set-valued predictions. As a result, the construction of credal sets of labels is supported by a rigorous theoretical foundation, leading to better calibrated and less error-prone supervision for unlabeled data. Along with this, we present effective algorithms for learning from credal self-supervision. An empirical study demonstrates excellent calibration properties of the pseudo-supervision, as well as the competitiveness of our method on several benchmark datasets.
翻译:在半监督学习中,自训练范式指从学习器自身生成的伪标签进行学习的思想。该方法在多个领域已被证明有效,并取得了最先进的性能。然而,伪标签通常源于特定启发式方法,虽依赖预测质量却缺乏有效性保证。其中一种名为置信自监督学习的方法,通过维护标签上概率分布集(而非单一分布)形式的伪监督,实现了灵活且具有不确定性感知的标注。但同样,其有效性缺乏理论支撑。为解决这一缺陷,我们引入共形预测——一种具有集合预测有效性保证的方法。由此,标签置信集的构建获得了严格的理论基础,能够为无标签数据提供校准更优、错误率更低的监督。在此基础上,我们提出了从置信自监督中学习的有效算法。实证研究表明,该伪监督方法具有出色的校准特性,并在多个基准数据集上展现出与现有方法相当的竞争力。