Data and model are the undoubtable two supporting pillars for LiDAR object detection. However, data-centric works have fallen far behind compared with the ever-growing list of fancy new models. In this work, we systematically study the synthesis-based LiDAR data augmentation approach (so-called GT-Aug) which offers maxium controllability over generated data samples. We pinpoint the main shortcoming of existing works is introducing unrealistic LiDAR scan patterns during GT-Aug. In light of this finding, we propose Real-Aug, a synthesis-based augmentation method which prioritizes on generating realistic LiDAR scans. Our method consists a reality-conforming scene composition module which handles the details of the composition and a real-synthesis mixing up training strategy which gradually adapts the data distribution from synthetic data to the real one. To verify the effectiveness of our methods, we conduct extensive ablation studies and validate the proposed Real-Aug on a wide combination of detectors and datasets. We achieve a state-of-the-art 0.744 NDS and 0.702 mAP on nuScenes test set. The code shall be released soon.
翻译:数据和模型无疑是激光雷达目标检测的两大支柱。然而,与层出不穷的新型模型相比,以数据为中心的研究进展明显滞后。本文系统性地研究了基于合成的激光雷达数据增强方法(即GT-Aug),该方法能够对生成的数据样本实现最大程度的可控性。我们指出现有方法的主要缺陷在于GT-Aug过程中引入了不真实的激光雷达扫描模式。基于这一发现,我们提出Real-Aug,一种优先生成真实激光雷达扫描的合成数据增强方法。该方法包含一个遵循现实场景构图的模块,用于处理合成细节,以及一种真实-合成混合训练策略,使数据分布从合成数据逐步适应真实数据。为验证方法的有效性,我们开展了广泛的消融实验,并在多种检测器与数据集的组合上验证了Real-Aug的性能。在nuScenes测试集上,我们达到了0.744 NDS和0.702 mAP的最新最优结果。代码即将开源。