Open-world classification systems should discern out-of-distribution (OOD) data whose labels deviate from those of in-distribution (ID) cases, motivating recent studies in OOD detection. Advanced works, despite their promising progress, may still fail in the open world, owing to the lack of knowledge about unseen OOD data in advance. Although one can access auxiliary OOD data (distinct from unseen ones) for model training, it remains to analyze how such auxiliary data will work in the open world. To this end, we delve into such a problem from a learning theory perspective, finding that the distribution discrepancy between the auxiliary and the unseen real OOD data is the key to affecting the open-world detection performance. Accordingly, we propose Distributional-Augmented OOD Learning (DAL), alleviating the OOD distribution discrepancy by crafting an OOD distribution set that contains all distributions in a Wasserstein ball centered on the auxiliary OOD distribution. We justify that the predictor trained over the worst OOD data in the ball can shrink the OOD distribution discrepancy, thus improving the open-world detection performance given only the auxiliary OOD data. We conduct extensive evaluations across representative OOD detection setups, demonstrating the superiority of our DAL over its advanced counterparts.
翻译:开放世界分类系统应当能够识别出与内分布(ID)样本标签不同的分布外(OOD)数据,这推动了近期关于OOD检测的研究。尽管已有方法取得了令人瞩目的进展,但在开放世界中仍可能失败,原因在于缺乏对未见OOD数据的先验知识。虽然可以利用辅助OOD数据(与未见数据不同)进行模型训练,但如何分析此类辅助数据在开放世界中的作用仍是一个待解问题。为此,我们从学习理论视角深入探究该问题,发现辅助数据与真实未见OOD数据之间的分布差异是影响开放世界检测性能的关键。基于此,我们提出分布增强OOD学习(DAL),通过构建一个包含以辅助OOD分布为中心的Wasserstein球内所有分布的OOD分布集,来缓解OOD分布差异。我们证明,在该球内最差OOD数据上训练的预测器能够缩小OOD分布差异,从而在仅有辅助OOD数据的情况下提升开放世界检测性能。我们在代表性OOD检测设置上进行了广泛评估,结果表明我们的DAL方法相较于现有先进方法具有显著优越性。