Detecting and rejecting unknown out-of-distribution (OOD) samples is critical for deployed neural networks to void unreliable predictions. In real-world scenarios, however, the efficacy of existing OOD detection methods is often impeded by the inherent imbalance of in-distribution (ID) data, which causes significant performance decline. Through statistical observations, we have identified two common challenges faced by different OOD detectors: misidentifying tail class ID samples as OOD, while erroneously predicting OOD samples as head class from ID. To explain this phenomenon, we introduce a generalized statistical framework, termed ImOOD, to formulate the OOD detection problem on imbalanced data distribution. Consequently, the theoretical analysis reveals that there exists a class-aware bias item between balanced and imbalanced OOD detection, which contributes to the performance gap. Building upon this finding, we present a unified training-time regularization technique to mitigate the bias and boost imbalanced OOD detectors across architecture designs. Our theoretically grounded method translates into consistent improvements on the representative CIFAR10-LT, CIFAR100-LT, and ImageNet-LT benchmarks against several state-of-the-art OOD detection approaches. Code will be made public soon.
翻译:检测并拒绝未知的分布外样本对于已部署的神经网络避免不可靠预测至关重要。然而,在实际场景中,现有OOD检测方法的有效性常受分布内数据固有不平衡性的阻碍,导致性能显著下降。通过统计观察,我们识别出不同OOD检测器面临的两个常见挑战:将尾部类别的ID样本误判为OOD,同时错误地将OOD样本预测为ID的头部类别。为解释此现象,我们提出了一个广义统计框架(称为ImOOD)来形式化不平衡数据分布上的OOD检测问题。理论分析表明,平衡与不平衡OOD检测之间存在类感知偏差项,该偏差导致了性能差距。基于此发现,我们提出了一种统一的训练时正则化技术来减轻偏差,并提升跨架构设计的不平衡OOD检测器性能。我们这种具有理论依据的方法在代表性的CIFAR10-LT、CIFAR100-LT和ImageNet-LT基准测试中,相较于多种先进OOD检测方法,实现了持续的性能提升。代码将很快公开。