Open-set semi-supervised learning (OSSL) leverages unlabeled data containing both in-distribution (ID) and unknown out-of-distribution (OOD) samples, aiming simultaneously to improve closed-set accuracy and detect novel OOD instances. Existing methods either discard valuable information from uncertain samples or force-align every unlabeled sample into one or a few synthetic "catch-all" representations, resulting in geometric collapse and overconfidence on only seen OODs. To address the limitations, we introduce selective non-alignment, adding a novel "skip" operator into conventional pull and push operations of contrastive learning. Our framework, SkipAlign, selectively skips alignment (pulling) for low-confidence unlabeled samples, retaining only gentle repulsion against ID prototypes. This approach transforms uncertain samples into a pure repulsion signal, resulting in tighter ID clusters and naturally dispersed OOD features. Extensive experiments demonstrate that SkipAlign significantly outperforms state-of-the-art methods in detecting unseen OOD data without sacrificing ID classification accuracy.
翻译:开集半监督学习(OSSL)利用包含分布内(ID)和未知分布外(OOD)样本的未标记数据,旨在同时提升闭集分类精度并检测新颖的OOD实例。现有方法要么丢弃不确定样本中的有价值信息,要么强制将所有未标记样本对齐到一个或少数几个合成的“全能”表示中,导致几何塌缩以及对仅见过的OOD样本的过度自信。为克服这些局限,我们引入了选择性非对齐,在对比学习传统的拉近与推开操作中增加了一种新颖的“跳过”算子。我们的框架SkipAlign选择性地跳过对低置信度未标记样本的对齐(拉近)操作,仅保留其相对于ID原型的温和排斥力。这种方法将不确定样本转化为纯粹的排斥信号,从而产生更紧凑的ID簇和自然分散的OOD特征。大量实验表明,SkipAlign在检测未见过的OOD数据方面显著优于现有最先进方法,且不牺牲ID分类精度。