Out-of-distribution (OOD) detection is indispensable for safely deploying machine learning models in the wild. One of the key challenges is that models lack supervision signals from unknown data, and as a result, can produce overconfident predictions on OOD data. Recent work on outlier synthesis modeled the feature space as parametric Gaussian distribution, a strong and restrictive assumption that might not hold in reality. In this paper, we propose a novel framework, Non-Parametric Outlier Synthesis (NPOS), which generates artificial OOD training data and facilitates learning a reliable decision boundary between ID and OOD data. Importantly, our proposed synthesis approach does not make any distributional assumption on the ID embeddings, thereby offering strong flexibility and generality. We show that our synthesis approach can be mathematically interpreted as a rejection sampling framework. Extensive experiments show that NPOS can achieve superior OOD detection performance, outperforming the competitive rivals by a significant margin. Code is publicly available at https://github.com/deeplearning-wisc/npos.
翻译:分布外(OOD)检测对于安全部署机器学习模型于现实场景中不可或缺。关键挑战之一是模型缺乏来自未知数据的监督信号,从而可能对OOD数据产生过度自信的预测。近期关于异常合成的研究将特征空间建模为参数化高斯分布,这是一种强烈且具有限制性的假设,在现实中可能不成立。本文提出了一种新颖框架——非参数化异常样本合成(NPOS),该框架生成人工OOD训练数据,并有助于学习领域内(ID)与OOD数据之间可靠的决策边界。重要的是,我们提出的合成方法未对ID嵌入做出任何分布假设,因此具备强大的灵活性与泛化性。我们证明该合成方法在数学上可解释为拒绝采样框架。大量实验表明,NPOS能够实现卓越的OOD检测性能,显著超越具有竞争力的对比方法。代码已开源:https://github.com/deeplearning-wisc/npos。