Architectures that first convert point clouds to a grid representation and then apply convolutional neural networks achieve good performance for radar-based object detection. However, the transfer from irregular point cloud data to a dense grid structure is often associated with a loss of information, due to the discretization and aggregation of points. In this paper, we propose a novel architecture, multi-scale KPPillarsBEV, that aims to mitigate the negative effects of grid rendering. Specifically, we propose a novel grid rendering method, KPBEV, which leverages the descriptive power of kernel point convolutions to improve the encoding of local point cloud contexts during grid rendering. In addition, we propose a general multi-scale grid rendering formulation to incorporate multi-scale feature maps into convolutional backbones of detection networks with arbitrary grid rendering methods. We perform extensive experiments on the nuScenes dataset and evaluate the methods in terms of detection performance and computational complexity. The proposed multi-scale KPPillarsBEV architecture outperforms the baseline by 5.37% and the previous state of the art by 2.88% in Car AP4.0 (average precision for a matching threshold of 4 meters) on the nuScenes validation set. Moreover, the proposed single-scale KPBEV grid rendering improves the Car AP4.0 by 2.90% over the baseline while maintaining the same inference speed.
翻译:为利用卷积神经网络进行雷达目标检测,现有方法常先将点云数据转换为网格表示,再应用卷积神经网络。然而,由于离散化和点聚合过程,从非结构化点云数据向密集网格结构的转换常导致信息损失。本文提出一种新型架构——多尺度KPPillarsBEV,旨在减轻网格渲染的负面影响。具体而言,我们提出一种新型网格渲染方法KPBEV,利用核点卷积的描述能力在网格渲染过程中改进局部点云上下文的编码。此外,我们提出通用多尺度网格渲染公式,可将多尺度特征图集成到采用任意网格渲染方法的检测网络卷积主干中。我们在nuScenes数据集上开展广泛实验,从检测性能和计算复杂度两个维度评估所提方法。在nuScenes验证集上,所提多尺度KPPillarsBEV架构在Car AP4.0(匹配阈值为4米时的平均精度)指标上相比基线提升5.37%,相比先前最优方法提升2.88%。此外,所提单尺度KPBEV网格渲染方法在保持与基线相同推理速度的同时,将Car AP4.0指标提升了2.90%。