Object detection is the central issue of intelligent traffic systems, and recent advancements in single-vehicle lidar-based 3D detection indicate that it can provide accurate position information for intelligent agents to make decisions and plan. Compared with single-vehicle perception, multi-view vehicle-road cooperation perception has fundamental advantages, such as the elimination of blind spots and a broader range of perception, and has become a research hotspot. However, the current perception of cooperation focuses on improving the complexity of fusion while ignoring the fundamental problems caused by the absence of single-view outlines. We propose a multi-view vehicle-road cooperation perception system, vehicle-to-everything cooperative perception (V2X-AHD), in order to enhance the identification capability, particularly for predicting the vehicle's shape. At first, we propose an asymmetric heterogeneous distillation network fed with different training data to improve the accuracy of contour recognition, with multi-view teacher features transferring to single-view student features. While the point cloud data are sparse, we propose Spara Pillar, a spare convolutional-based plug-in feature extraction backbone, to reduce the number of parameters and improve and enhance feature extraction capabilities. Moreover, we leverage the multi-head self-attention (MSA) to fuse the single-view feature, and the lightweight design makes the fusion feature a smooth expression. The results of applying our algorithm to the massive open dataset V2Xset demonstrate that our method achieves the state-of-the-art result. The V2X-AHD can effectively improve the accuracy of 3D object detection and reduce the number of network parameters, according to this study, which serves as a benchmark for cooperative perception. The code for this article is available at https://github.com/feeling0414-lab/V2X-AHD.
翻译:目标检测是智能交通系统的核心问题,基于单车激光雷达的三维检测技术的最新进展表明,其可为智能体决策与规划提供精确的位置信息。与单车感知相比,多视角车路协同感知具有消除盲区、扩展感知范围等根本性优势,已成为研究热点。然而,现有协同感知研究侧重于提升融合机制的复杂性,忽视了单视角轮廓缺失引发的根本问题。为增强车辆形状识别能力,本文提出多视角车路协同感知系统V2X-AHD(车联网协同感知)。首先,我们提出非对称异构蒸馏网络,通过不同训练数据输入提升轮廓识别精度,将多视角教师特征迁移至单视角学生特征。针对点云数据稀疏性问题,提出基于稀疏卷积的即插即用特征提取主干Spara Pillar,在减少参数量的同时增强特征提取能力。此外,采用多头自注意力机制融合单视角特征,轻量化设计确保融合特征的平滑表达。在大型公开数据集V2Xset上的实验结果表明,本方法达到最优性能。该研究为协同感知树立基准,证明V2X-AHD可有效提升三维目标检测精度并减少网络参数量。本文代码开源于https://github.com/feeling0414-lab/V2X-AHD。