Out-of-distribution (OOD) inputs can compromise the performance and safety of real world machine learning systems. While many methods exist for OOD detection and work well on small scale datasets with lower resolution and few classes, few methods have been developed for large-scale OOD detection. Existing large-scale methods generally depend on maximum classification probability, such as the state-of-the-art grouped softmax method. In this work, we develop a novel approach that calculates the probability of the predicted class label based on label distributions learned during the training process. Our method performs better than current state-of-the-art methods with only a negligible increase in compute cost. We evaluate our method against contemporary methods across $14$ datasets and achieve a statistically significant improvement with respect to AUROC (84.2 vs 82.4) and AUPR (96.2 vs 93.7).
翻译:分布外输入可能影响真实世界机器学习系统的性能与安全性。尽管目前存在多种分布外检测方法,且这些方法在分辨率较低、类别较少的小规模数据集上表现良好,但针对大规模分布外检测的方法却较为有限。现有的大规模方法通常依赖于最大分类概率,例如当前最先进的组别Softmax方法。在本研究中,我们提出了一种新颖方法,该方法基于训练过程中学习的标签分布来计算预测类标签的概率。我们的方法在仅增加极低计算成本的前提下,性能优于当前最先进的方法。我们在14个数据集上对提出的方法进行了评估,与现有方法相比,在AUROC(84.2对82.4)和AUPR(96.2对93.7)指标上均取得了统计显著的改进。