Object detection is crucial for ensuring safe autonomous driving. However, data-driven approaches face challenges when encountering minority or novel objects in the 3D driving scene. In this paper, we propose VisLED, a language-driven active learning framework for diverse open-set 3D Object Detection. Our method leverages active learning techniques to query diverse and informative data samples from an unlabeled pool, enhancing the model's ability to detect underrepresented or novel objects. Specifically, we introduce the Vision-Language Embedding Diversity Querying (VisLED-Querying) algorithm, which operates in both open-world exploring and closed-world mining settings. In open-world exploring, VisLED-Querying selects data points most novel relative to existing data, while in closed-world mining, it mines new instances of known classes. We evaluate our approach on the nuScenes dataset and demonstrate its effectiveness compared to random sampling and entropy-querying methods. Our results show that VisLED-Querying consistently outperforms random sampling and offers competitive performance compared to entropy-querying despite the latter's model-optimality, highlighting the potential of VisLED for improving object detection in autonomous driving scenarios.
翻译:物体检测对于确保安全自动驾驶至关重要。然而,数据驱动的方法在应对三维驾驶场景中的少数或新型物体时面临挑战。本文提出VisLED——一种语言驱动的主动学习框架,用于多样化开放集三维物体检测。该方法利用主动学习技术从未标注数据池中查询多样且信息丰富的样本,以增强模型检测代表性不足或新型物体的能力。具体而言,我们引入了视觉-语言嵌入多样性查询算法(VisLED-Querying),该算法可在开放世界探索和封闭世界挖掘两种模式下运行。在开放世界探索中,VisLED-Querying选择与现有数据相比最新颖的数据点;在封闭世界挖掘中,则挖掘已知类的新实例。我们在nuScenes数据集上评估该方法,并证明了其相较于随机采样和熵查询方法的有效性。结果表明,VisLED-Querying始终优于随机采样,且尽管熵查询方法具有模型最优性,VisLED-Querying仍展现出与之相当的竞争性能,凸显了VisLED在自动驾驶场景中改进物体检测的潜力。