Machine learning models deployed in the wild can be challenged by out-of-distribution (OOD) data from unknown classes. Recent advances in OOD detection rely on distance measures to distinguish samples that are relatively far away from the in-distribution (ID) data. Despite the promise, distance-based methods can suffer from the curse-of-dimensionality problem, which limits the efficacy in high-dimensional feature space. To combat this problem, we propose a novel framework, Subspace Nearest Neighbor (SNN), for OOD detection. In training, our method regularizes the model and its feature representation by leveraging the most relevant subset of dimensions (i.e. subspace). Subspace learning yields highly distinguishable distance measures between ID and OOD data. We provide comprehensive experiments and ablations to validate the efficacy of SNN. Compared to the current best distance-based method, SNN reduces the average FPR95 by 15.96% on the CIFAR-100 benchmark.
翻译:在现实环境中部署的机器学习模型可能面临来自未知类别的分布外(OOD)数据的挑战。近期分布外检测的进展依赖于距离度量来区分与分布内(ID)数据距离较远的样本。尽管前景广阔,基于距离的方法可能受到维数灾难问题的影响,这限制了其在高维特征空间中的有效性。为解决这一问题,我们提出了一种新颖的框架——子空间最近邻(SNN),用于分布外检测。在训练过程中,我们的方法通过利用最相关的维度子集(即子空间)来正则化模型及其特征表示。子空间学习能生成分布内与分布外数据之间高度可区分的距离度量。我们通过全面的实验和消融研究验证了SNN的有效性。与当前最佳的基于距离的方法相比,SNN在CIFAR-100基准测试上将平均FPR95降低了15.96%。