For autonomous vehicles, driving safely is highly dependent on the capability to correctly perceive the environment in 3D space, hence the task of 3D object detection represents a fundamental aspect of perception. While 3D sensors deliver accurate metric perception, monocular approaches enjoy cost and availability advantages that are valuable in a wide range of applications. Unfortunately, training monocular methods requires a vast amount of annotated data. Interestingly, self-supervised approaches have recently been successfully applied to ease the training process and unlock access to widely available unlabelled data. While related research leverages different priors including LIDAR scans and stereo images, such priors again limit usability. Therefore, in this work, we propose a novel approach to self-supervise 3D object detection purely from RGB sequences alone, leveraging multi-view constraints and weak labels. Our experiments on KITTI 3D dataset demonstrate performance on par with state-of-the-art self-supervised methods using LIDAR scans or stereo images.
翻译:对于自动驾驶车辆而言,安全驾驶高度依赖于在三维空间中正确感知环境的能力,因此三维目标检测任务构成了感知的基本方面。虽然三维传感器能提供精确的度量感知,但单目方法在成本和可用性方面具有优势,这在广泛的应用场景中极具价值。遗憾的是,训练单目方法需要大量标注数据。有趣的是,自监督方法近期已被成功应用于简化训练过程,并解锁了对广泛可用的未标注数据的访问。尽管相关研究利用了包括激光雷达扫描和立体图像在内的不同先验信息,但这些先验信息再次限制了方法的可用性。因此,在本工作中,我们提出了一种新颖的方法,仅从RGB序列数据中实现三维目标检测的自监督,利用多视图约束与弱标签。我们在KITTI三维数据集上的实验表明,其性能与使用激光雷达扫描或立体图像的先进自监督方法相当。