In the field of autonomous driving, monocular 3D detection is a critical task which estimates 3D properties (depth, dimension, and orientation) of objects in a single RGB image. Previous works have used features in a heuristic way to learn 3D properties, without considering that inappropriate features could have adverse effects. In this paper, sample selection is introduced that only suitable samples should be trained to regress the 3D properties. To select samples adaptively, we propose a Learnable Sample Selection (LSS) module, which is based on Gumbel-Softmax and a relative-distance sample divider. The LSS module works under a warm-up strategy leading to an improvement in training stability. Additionally, since the LSS module dedicated to 3D property sample selection relies on object-level features, we further develop a data augmentation method named MixUp3D to enrich 3D property samples which conforms to imaging principles without introducing ambiguity. As two orthogonal methods, the LSS module and MixUp3D can be utilized independently or in conjunction. Sufficient experiments have shown that their combined use can lead to synergistic effects, yielding improvements that transcend the mere sum of their individual applications. Leveraging the LSS module and the MixUp3D, without any extra data, our method named MonoLSS ranks 1st in all three categories (Car, Cyclist, and Pedestrian) on KITTI 3D object detection benchmark, and achieves competitive results on both the Waymo dataset and KITTI-nuScenes cross-dataset evaluation. The code is included in the supplementary material and will be released to facilitate related academic and industrial studies.
翻译:在自动驾驶领域,单目3D检测是一项关键任务,旨在从单张RGB图像中估计物体的三维属性(深度、尺寸和方向)。先前的研究通常以启发式方式利用特征来学习三维属性,而未考虑不合适的特征可能产生负面影响。本文引入了样本选择机制,认为仅应使用合适的样本来训练三维属性的回归。为实现自适应样本选择,我们提出了一个基于Gumbel-Softmax和相对距离样本划分器的可学习样本选择模块。该模块在预热策略下工作,有效提升了训练稳定性。此外,由于专用于三维属性样本选择的LSS模块依赖于物体级特征,我们进一步开发了一种名为MixUp3D的数据增强方法,以在不引入歧义的前提下,遵循成像原理来丰富三维属性样本。作为两种正交方法,LSS模块与MixUp3D既可独立使用,亦可结合使用。充分的实验表明,二者的联合使用能产生协同效应,其带来的性能提升超越了单独应用时的简单叠加。借助LSS模块与MixUp3D,我们的方法MonoLSS在未使用任何额外数据的情况下,在KITTI三维物体检测基准的所有三个类别(汽车、骑行者和行人)中均排名第一,并在Waymo数据集及KITTI-nuScenes跨数据集评估中取得了具有竞争力的结果。代码已包含在补充材料中并将公开发布,以促进相关学术与工业研究。