Object detection is crucial for ensuring safe autonomous driving. However, data-driven approaches face challenges when encountering minority or novel objects in the 3D driving scene. In this paper, we propose VisLED, a language-driven active learning framework for diverse open-set 3D Object Detection. Our method leverages active learning techniques to query diverse and informative data samples from an unlabeled pool, enhancing the model's ability to detect underrepresented or novel objects. Specifically, we introduce the Vision-Language Embedding Diversity Querying (VisLED-Querying) algorithm, which operates in both open-world exploring and closed-world mining settings. In open-world exploring, VisLED-Querying selects data points most novel relative to existing data, while in closed-world mining, it mines novel instances of known classes. We evaluate our approach on the nuScenes dataset and demonstrate its efficiency compared to random sampling and entropy-querying methods. Our results show that VisLED-Querying consistently outperforms random sampling and offers competitive performance compared to entropy-querying despite the latter's model-optimality, highlighting the potential of VisLED for improving object detection in autonomous driving scenarios. We make our code publicly available at https://github.com/Bjork-crypto/VisLED-Querying
翻译:目标检测对于确保自动驾驶安全至关重要。然而,数据驱动方法在三维驾驶场景中遇到少数类或新颖物体时面临挑战。本文提出VisLED,一种用于多样化开放集三维目标检测的语言驱动主动学习框架。该方法利用主动学习技术从未标注数据池中查询多样化和信息丰富的数据样本,从而增强模型检测代表性不足或新颖物体的能力。具体而言,我们提出了视觉-语言嵌入多样性查询算法,该算法可在开放世界探索和封闭世界挖掘两种设置下运行。在开放世界探索中,VisLED-Querying选择相对于现有数据最具新颖性的数据点;在封闭世界挖掘中,则挖掘已知类别的新颖实例。我们在nuScenes数据集上评估了所提方法,并证明了其相较于随机采样和熵查询方法的效率优势。实验结果表明,VisLED-Querying始终优于随机采样,且与具有模型最优性的熵查询方法相比仍具有竞争力,这凸显了VisLED在改善自动驾驶场景中目标检测性能方面的潜力。我们在https://github.com/Bjork-crypto/VisLED-Querying 公开了代码。