Accurate 3D object detection in real-world environments requires a huge amount of annotated data with high quality. Acquiring such data is tedious and expensive, and often needs repeated effort when a new sensor is adopted or when the detector is deployed in a new environment. We investigate a new scenario to construct 3D object detectors: learning from the predictions of a nearby unit that is equipped with an accurate detector. For example, when a self-driving car enters a new area, it may learn from other traffic participants whose detectors have been optimized for that area. This setting is label-efficient, sensor-agnostic, and communication-efficient: nearby units only need to share the predictions with the ego agent (e.g., car). Naively using the received predictions as ground-truths to train the detector for the ego car, however, leads to inferior performance. We systematically study the problem and identify viewpoint mismatches and mislocalization (due to synchronization and GPS errors) as the main causes, which unavoidably result in false positives, false negatives, and inaccurate pseudo labels. We propose a distance-based curriculum, first learning from closer units with similar viewpoints and subsequently improving the quality of other units' predictions via self-training. We further demonstrate that an effective pseudo label refinement module can be trained with a handful of annotated data, largely reducing the data quantity necessary to train an object detector. We validate our approach on the recently released real-world collaborative driving dataset, using reference cars' predictions as pseudo labels for the ego car. Extensive experiments including several scenarios (e.g., different sensors, detectors, and domains) demonstrate the effectiveness of our approach toward label-efficient learning of 3D perception from other units' predictions.
翻译:在现实环境中实现精确的三维目标检测需要大量高质量标注数据,而此类数据的获取过程既繁琐又昂贵。当采用新型传感器或将检测器部署至新环境时,往往需要重复进行数据标注工作。本文研究了一种构建三维目标检测器的新范式:通过邻近单元搭载的精确检测器所生成的预测结果进行学习。例如,当自动驾驶车辆进入新区域时,可向已适应该区域环境的其他交通参与者学习。该范式具有标注高效性、传感器无关性和通信高效性三大特点:邻近单元仅需向本车(ego agent)共享预测结果。然而,若直接将接收到的预测结果作为本车检测器的训练真值,会导致检测性能下降。我们系统性地研究了该问题,发现视角失配与定位偏差(由同步误差和GPS误差导致)是主要成因,这些因素不可避免地引发误检、漏检及伪标签不准确等问题。为此,我们提出基于距离的课程学习策略:首先从视角相近的邻近单元学习,继而通过自训练提升其他单元预测结果的质量。进一步研究表明,仅需少量标注数据即可训练出高效的伪标签优化模块,从而大幅降低目标检测器训练所需的数据量。我们在最新发布的真实世界协同驾驶数据集上验证了所提方法,将参考车辆的预测结果作为本车的伪标签。涵盖多种场景(如不同传感器、检测器及领域)的大量实验表明,该方法能有效实现基于其他单元预测的三维感知高效学习。