To achieve reliable and precise scene understanding, autonomous vehicles typically incorporate multiple sensing modalities to capitalize on their complementary attributes. However, existing cross-modal 3D detectors do not fully utilize the image domain information to address the bottleneck issues of the LiDAR-based detectors. This paper presents a new cross-modal 3D object detector, namely UPIDet, which aims to unleash the potential of the image branch from two aspects. First, UPIDet introduces a new 2D auxiliary task called normalized local coordinate map estimation. This approach enables the learning of local spatial-aware features from the image modality to supplement sparse point clouds. Second, we discover that the representational capability of the point cloud backbone can be enhanced through the gradients backpropagated from the training objectives of the image branch, utilizing a succinct and effective point-to-pixel module. Extensive experiments and ablation studies validate the effectiveness of our method. Notably, we achieved the top rank in the highly competitive cyclist class of the KITTI benchmark at the time of submission. The source code is available at https://github.com/Eaphan/UPIDet.
翻译:为实现可靠且精确的场景理解,自动驾驶车辆通常整合多种传感模态以利用其互补特性。然而,现有跨模态三维检测器未能充分利用图像域信息解决基于激光雷达检测器的瓶颈问题。本文提出一种新型跨模态三维目标检测器——UPIDet,旨在从两个方面释放图像分支的潜能。首先,UPIDet引入一种名为归一化局部坐标图估计的二维辅助任务,通过从图像模态学习具有局部空间感知能力的特征来补充稀疏点云。其次,我们发现通过简洁有效的点-像素模块,利用图像分支训练目标反向传播的梯度可增强点云骨干网络的表征能力。大量实验和消融研究验证了该方法的有效性。值得注意的是,在提交时,我们在KITTI基准测试中极具竞争力的骑行者类别中取得了最高排名。源代码已开源至https://github.com/Eaphan/UPIDet。