Recent years have witnessed huge successes in 3D object detection to recognize common objects for autonomous driving (e.g., vehicles and pedestrians). However, most methods rely heavily on a large amount of well-labeled training data. This limits their capability of detecting rare fine-grained objects (e.g., police cars and ambulances), which is important for special cases, such as emergency rescue, and so on. To achieve simultaneous detection for both common and rare objects, we propose a novel task, called generalized few-shot 3D object detection, where we have a large amount of training data for common (base) objects, but only a few data for rare (novel) classes. Specifically, we analyze in-depth differences between images and point clouds, and then present a practical principle for the few-shot setting in the 3D LiDAR dataset. To solve this task, we propose a simple and effective detection framework, including (1) an incremental fine-tuning method to extend existing 3D detection models to recognize both common and rare objects, and (2) a sample adaptive balance loss to alleviate the issue of long-tailed data distribution in autonomous driving scenarios. On the nuScenes dataset, we conduct sufficient experiments to demonstrate that our approach can successfully detect the rare (novel) classes that contain only a few training data, while also maintaining the detection accuracy of common objects.
翻译:摘要:近年来,三维目标检测在自动驾驶场景中识别常见物体(如车辆和行人)方面取得了巨大成功。然而,大多数方法高度依赖大量标注完善的训练数据,这限制了它们检测稀有细粒度物体(如警车和救护车)的能力,而这类检测对于应急救援等特殊场景至关重要。为实现常见与稀有物体的同步检测,我们提出了一项新任务——广义小样本三维目标检测。在该任务中,常见(基类)物体拥有大量训练数据,而稀有(新类)物体仅提供少量样本。具体而言,我们深入分析了图像与点云之间的本质差异,并针对三维激光雷达数据集提出适用于小样本场景的实用原则。为解决该任务,我们设计了一个简洁高效的检测框架,包括:(1)一种增量式微调方法,用于扩展现有三维检测模型使其能同时识别常见与稀有物体;(2)一种样本自适应平衡损失函数,用于缓解自动驾驶场景中长尾数据分布问题。基于nuScenes数据集的大量实验表明,本方法不仅能成功检测仅含少量训练数据的稀有(新类)物体,同时能保持常见物体的检测精度。