Current 3D object detection models follow a single dataset-specific training and testing paradigm, which often faces a serious detection accuracy drop when they are directly deployed in another dataset. In this paper, we study the task of training a unified 3D detector from multiple datasets. We observe that this appears to be a challenging task, which is mainly due to that these datasets present substantial data-level differences and taxonomy-level variations caused by different LiDAR types and data acquisition standards. Inspired by such observation, we present a Uni3D which leverages a simple data-level correction operation and a designed semantic-level coupling-and-recoupling module to alleviate the unavoidable data-level and taxonomy-level differences, respectively. Our method is simple and easily combined with many 3D object detection baselines such as PV-RCNN and Voxel-RCNN, enabling them to effectively learn from multiple off-the-shelf 3D datasets to obtain more discriminative and generalizable representations. Experiments are conducted on many dataset consolidation settings including Waymo-nuScenes, nuScenes-KITTI, Waymo-KITTI, and Waymo-nuScenes-KITTI consolidations. Their results demonstrate that Uni3D exceeds a series of individual detectors trained on a single dataset, with a 1.04x parameter increase over a selected baseline detector. We expect this work will inspire the research of 3D generalization since it will push the limits of perceptual performance.
翻译:当前3D目标检测模型遵循单一数据集训练与测试范式,当直接部署至另一数据集时,常面临严重的检测精度下降问题。本文研究从多个数据集训练统一3D检测器的任务。我们发现这似乎是一项具有挑战性的工作,主要源于不同激光雷达类型及数据采集标准导致的数据级差异与分类级差异。基于此观察,我们提出Uni3D,通过引入简单的数据级校正操作和设计的语义级耦合-解耦模块,分别缓解不可避免的数据级差异与分类级差异。该方法简洁且易于与PV-RCNN、Voxel-RCNN等主流3D检测基线结合,使其能够有效利用多个现成3D数据集进行学习,获得更具判别性与泛化性的表征。我们在多种数据集融合设置下进行实验,包括Waymo-nuScenes、nuScenes-KITTI、Waymo-KITTI及Waymo-nuScenes-KITTI融合。结果表明,Uni3D性能超越多个独立训练的单一数据集检测器,相较于所选基线检测器仅增加1.04倍参数量。我们期望本研究能推动3D泛化研究的发展,因其将突破感知性能的边界。