Recent advances in robust semi-supervised learning (SSL) typically filter out-of-distribution (OOD) information at the sample level. We argue that an overlooked problem of robust SSL is its corrupted information on semantic level, practically limiting the development of the field. In this paper, we take an initial step to explore and propose a unified framework termed OOD Semantic Pruning (OSP), which aims at pruning OOD semantics out from in-distribution (ID) features. Specifically, (i) we propose an aliasing OOD matching module to pair each ID sample with an OOD sample with semantic overlap. (ii) We design a soft orthogonality regularization, which first transforms each ID feature by suppressing its semantic component that is collinear with paired OOD sample. It then forces the predictions before and after soft orthogonality decomposition to be consistent. Being practically simple, our method shows a strong performance in OOD detection and ID classification on challenging benchmarks. In particular, OSP surpasses the previous state-of-the-art by 13.7% on accuracy for ID classification and 5.9% on AUROC for OOD detection on TinyImageNet dataset. The source codes are publicly available at https://github.com/rain305f/OSP.
翻译:近期鲁棒半监督学习的研究进展通常从样本层面滤除分布外信息。我们认为,鲁棒半监督学习中被忽视的问题在于语义层面的信息污染,这在实践上限制了该领域的发展。本文首次探索并提出统一框架——分布外语义剪枝,旨在从分布内特征中剪除分布外语义。具体而言:(i) 我们提出混叠分布外匹配模块,为每个分布内样本配对具有语义重叠的分布外样本;(ii) 我们设计软正交正则化方法,首先通过抑制与配对分布外样本共线的语义成分来变换每个分布内特征,随后强制要求软正交分解前后的预测保持一致。该方法实践简洁,在具有挑战性的基准测试中展现出显著的分布外检测与分布内分类性能。特别地,在TinyImageNet数据集上,OSP在分布内分类准确率上超越此前最先进方法13.7%,在分布外检测AUROC指标上提升5.9%。源代码已开源至https://github.com/rain305f/OSP。