In open-set semi-supervised learning (OSSL), we consider unlabeled datasets that may contain unknown classes. Existing OSSL methods often use the softmax confidence for classifying data as in-distribution (ID) or out-of-distribution (OOD). Additionally, many works for OSSL rely on ad-hoc thresholds for ID/OOD classification, without considering the statistics of the problem. We propose a new score for ID/OOD classification based on angles in feature space between data and an ID subspace. Moreover, we propose an approach to estimate the conditional distributions of scores given ID or OOD data, enabling probabilistic predictions of data being ID or OOD. These components are put together in a framework for OSSL, termed ProSub, that is experimentally shown to reach SOTA performance on several benchmark problems. Our code is available at https://github.com/walline/prosub.
翻译:在开放集半监督学习(OSSL)中,我们考虑可能包含未知类别的未标记数据集。现有的OSSL方法通常使用softmax置信度将数据分类为分布内(ID)或分布外(OOD)。此外,许多OSSL研究依赖于临时阈值进行ID/OOD分类,而未考虑问题的统计特性。我们提出了一种基于特征空间中数据与ID子空间之间夹角的新评分方法,用于ID/OOD分类。此外,我们提出了一种估计给定ID或OOD数据时评分条件分布的方法,从而能够对数据属于ID或OOD进行概率预测。这些组件被整合到一个名为ProSub的OSSL框架中,实验表明该框架在多个基准问题上达到了最先进的性能。我们的代码可在 https://github.com/walline/prosub 获取。