In open-set semi-supervised learning (OSSL), we consider unlabeled datasets that may contain unknown classes. Existing OSSL methods often use the softmax confidence for classifying data as in-distribution (ID) or out-of-distribution (OOD). Additionally, many works for OSSL rely on ad-hoc thresholds for ID/OOD classification, without considering the statistics of the problem. We propose a new score for ID/OOD classification based on angles in feature space between data and an ID subspace. Moreover, we propose an approach to estimate the conditional distributions of scores given ID or OOD data, enabling probabilistic predictions of data being ID or OOD. These components are put together in a framework for OSSL, termed \emph{ProSub}, that is experimentally shown to reach SOTA performance on several benchmark problems. Our code is available at https://github.com/walline/prosub.
翻译:在开放集半监督学习中,我们考虑可能包含未知类别的未标记数据集。现有的开放集半监督学习方法通常使用Softmax置信度将数据分类为分布内或分布外。此外,许多开放集半监督学习工作依赖于临时阈值进行分布内/分布外分类,而未考虑问题的统计特性。我们提出了一种基于特征空间中数据与分布内子空间之间夹角的新评分,用于分布内/分布外分类。此外,我们提出了一种估计给定分布内或分布外数据时评分条件分布的方法,从而能够对数据属于分布内或分布外进行概率预测。这些组件被整合到一个名为 \emph{ProSub} 的开放集半监督学习框架中,实验证明该框架在多个基准问题上达到了最先进的性能。我们的代码可在 https://github.com/walline/prosub 获取。