Revisiting Out-of-Distribution Detection in Real-time Object Detection: From Benchmark Pitfalls to a New Mitigation Paradigm

Out-of-distribution (OoD) inputs pose a persistent challenge to deep learning models, often triggering overconfident predictions on non-target objects. While prior work has primarily focused on refining scoring functions and adjusting test-time thresholds, such algorithmic improvements offer only incremental gains. We argue that a rethinking of the entire development lifecycle is needed to mitigate these risks effectively. This work addresses two overlooked dimensions of OoD detection in object detection. First, we reveal fundamental flaws in widely used evaluation benchmarks: contrary to their design intent, up to 13% of objects in the OoD test sets actually belong to in-distribution classes, and vice versa. These quality issues severely distort the reported performance of existing methods and contribute to their high false positive rates. Second, we introduce a novel training-time mitigation paradigm that operates independently of external OoD detectors. Instead of relying solely on post-hoc scoring, we fine-tune the detector using a carefully synthesized OoD dataset that semantically resembles in-distribution objects. This process shapes a defensive decision boundary by suppressing objectness on OoD objects, leading to a 91% reduction in hallucination error of a YOLO model on BDD-100K. Our methodology generalizes across detection paradigms such as YOLO, Faster R-CNN, and RT-DETR, and supports few-shot adaptation. Together, these contributions offer a principled and effective way to reduce OoD-induced hallucination in object detectors. Code and data are available at: https://gricad-gitlab.univ-grenoble-alpes.fr/dnn-safety/m-hood.

翻译：分布外输入对深度学习模型构成持续挑战，常导致对非目标物体产生过度自信的预测。现有研究主要聚焦于改进评分函数和调整测试时阈值，但此类算法优化仅能带来有限提升。我们认为需要重新思考整个开发生命周期才能有效缓解这些风险。本研究针对目标检测中分布外检测两个被忽视的维度展开探索。首先，我们揭示了广泛使用的评估基准存在根本缺陷：与其设计初衷相悖，分布外测试集中高达13%的物体实际属于分布内类别，反之亦然。这些质量问题严重扭曲了现有方法的报告性能，并导致其高误报率。其次，我们提出了一种独立于外部分布外检测器的新型训练时缓解范式。该方法不依赖事后评分，而是通过使用语义上接近分布内物体的精心合成分布外数据集对检测器进行微调。该过程通过抑制分布外物体的目标性来构建防御性决策边界，使YOLO模型在BDD-100K数据集上的幻觉误差降低91%。我们的方法可泛化至YOLO、Faster R-CNN和RT-DETR等多种检测范式，并支持少样本适应。这些贡献共同为目标检测器减少分布外诱发幻觉提供了原则性且有效的解决方案。代码与数据详见：https://gricad-gitlab.univ-grenoble-alpes.fr/dnn-safety/m-hood。