Recent advances in robust semi-supervised learning (SSL) typically filter out-of-distribution (OOD) information at the sample level. We argue that an overlooked problem of robust SSL is its corrupted information on semantic level, practically limiting the development of the field. In this paper, we take an initial step to explore and propose a unified framework termed OOD Semantic Pruning (OSP), which aims at pruning OOD semantics out from in-distribution (ID) features. Specifically, (i) we propose an aliasing OOD matching module to pair each ID sample with an OOD sample with semantic overlap. (ii) We design a soft orthogonality regularization, which first transforms each ID feature by suppressing its semantic component that is collinear with paired OOD sample. It then forces the predictions before and after soft orthogonality decomposition to be consistent. Being practically simple, our method shows a strong performance in OOD detection and ID classification on challenging benchmarks. In particular, OSP surpasses the previous state-of-the-art by 13.7% on accuracy for ID classification and 5.9% on AUROC for OOD detection on TinyImageNet dataset. The source codes are publicly available at https://github.com/rain305f/OSP.
翻译:鲁棒半监督学习的最新进展通常是在样本层面过滤分布外信息。我们认为鲁棒半监督学习中一个被忽视的问题是其语义层面的信息污染,这在实践中限制了该领域的发展。本文迈出探索性的一步,提出一个统一框架——分布外语义剪枝(OSP),旨在从分布内特征中修剪掉分布外语义。具体而言:(i) 我们提出一个混叠分布外匹配模块,为每个分布内样本配对具有语义重叠的分布外样本;(ii) 我们设计一种软正交正则化方法,首先通过抑制与配对分布外样本共线的语义分量来变换每个分布内特征,然后强制软正交分解前后的预测保持一致。该方法实现简单,在具有挑战性的基准测试中展现出分布外检测和分布内分类的强劲性能。特别地,在TinyImageNet数据集上,OSP在分布内分类准确率上超越此前最优方法13.7%,在分布外检测的AUROC指标上提升5.9%。源代码已公开于https://github.com/rain305f/OSP。