Out-of-distribution (OOD) inputs can compromise the performance and safety of real world machine learning systems. While many methods exist for OOD detection and work well on small scale datasets with lower resolution and few classes, few methods have been developed for large-scale OOD detection. Existing large-scale methods generally depend on maximum classification probability, such as the state-of-the-art grouped softmax method. In this work, we develop a novel approach that calculates the probability of the predicted class label based on label distributions learned during the training process. Our method performs better than current state-of-the-art methods with only a negligible increase in compute cost. We evaluate our method against contemporary methods across $14$ datasets and achieve a statistically significant improvement with respect to AUROC (84.2 vs 82.4) and AUPR (96.2 vs 93.7).
翻译:分布外输入可能损害实际机器学习系统的性能和安全性。尽管已有多种分布外检测方法,且能在低分辨率、少类别的小规模数据集上表现良好,但针对大规模分布外检测的方法却寥寥无几。现有的大规模方法通常依赖最大分类概率,例如当前最先进的分组软标签方法。在本研究中,我们提出了一种新方法,基于训练过程中学习的标签分布来计算预测类别的概率。我们的方法在仅增加可忽略的计算成本下,性能超越当前最先进的方法。我们在14个数据集上将其与当前方法进行对比评估,在AUROC(84.2 vs 82.4)和AUPR(96.2 vs 93.7)上取得了统计显著的改进。