LiDAR-based 3D object detection plays a critical role for reliable and safe autonomous driving systems. However, existing detectors often produce overly confident predictions for objects not belonging to known categories, posing significant safety risks. This is caused by so-called out-of-distribution (OOD) objects, which were not part of the training data, resulting in incorrect predictions. To address this challenge, we propose ALOOD (Aligned LiDAR representations for Out-Of-Distribution Detection), a novel approach that incorporates language representations from a vision-language model (VLM). By aligning the object features from the object detector to the feature space of the VLM, we can treat the detection of OOD objects as a zero-shot classification task. We demonstrate competitive performance on the nuScenes OOD benchmark, establishing a novel approach to OOD object detection in LiDAR using language representations. The source code is available at https://github.com/uulm-mrm/mmood3d.
翻译:基于激光雷达的三维目标检测对于可靠且安全的自动驾驶系统至关重要。然而,现有检测器常对不属于已知类别的物体产生过度自信的预测,构成显著的安全风险。这由所谓的分布外(OOD)物体引起,这些物体未包含在训练数据中,导致错误预测。为应对这一挑战,我们提出ALOOD(用于分布外检测的对齐激光雷达表征),这是一种融合视觉语言模型(VLM)中语言表征的新方法。通过将目标检测器中的物体特征对齐到VLM的特征空间,我们可以将OOD物体的检测视为零样本分类任务。我们在nuScenes OOD基准测试中展示了具有竞争力的性能,为利用语言表征进行激光雷达OOD目标检测确立了一种新方法。源代码发布于https://github.com/uulm-mrm/mmood3d。