The objective of the collaborative vehicle-to-everything perception task is to enhance the individual vehicle's perception capability through message communication among neighboring traffic agents. Previous methods focus on achieving optimal performance within bandwidth limitations and typically adopt BEV maps as the basic collaborative message units. However, we demonstrate that collaboration with dense representations is plagued by object feature destruction during message packing, inefficient message aggregation for long-range collaboration, and implicit structure representation communication. To tackle these issues, we introduce a brand new message unit, namely point cluster, designed to represent the scene sparsely with a combination of low-level structure information and high-level semantic information. The point cluster inherently preserves object information while packing messages, with weak relevance to the collaboration range, and supports explicit structure modeling. Building upon this representation, we propose a novel framework V2X-PC for collaborative perception. This framework includes a Point Cluster Packing (PCP) module to keep object feature and manage bandwidth through the manipulation of cluster point numbers. As for effective message aggregation, we propose a Point Cluster Aggregation (PCA) module to match and merge point clusters associated with the same object. To further handle time latency and pose errors encountered in real-world scenarios, we propose parameter-free solutions that can adapt to different noisy levels without finetuning. Experiments on two widely recognized collaborative perception benchmarks showcase the superior performance of our method compared to the previous state-of-the-art approaches relying on BEV maps.
翻译:车联万物协同感知任务旨在通过相邻交通智能体间的信息交互,提升单个车辆的感知能力。现有方法聚焦于带宽限制下的最优性能表现,通常采用BEV地图作为基础协同消息单元。然而,我们证明:密集表征的协作在消息打包时会破坏物体特征,长距离协作时消息聚合效率低下,且难以传递隐式结构表征。为解决上述问题,我们提出全新消息单元——点簇,通过融合低级结构信息与高级语义信息实现场景稀疏表征。该单元在消息打包时固有保留物体特征,弱化协作距离关联性,并支持显式结构建模。基于此表征,我们构建V2X-PC协同感知新框架。该框架包含点簇打包模块,通过调控簇点数保持物体特征并管理带宽。在高效消息聚合方面,提出点簇聚合模块实现同一物体对应点簇的匹配合并。针对现实场景中的时延与位姿误差,进一步提出无需微调、可自适应不同噪声水平的免参数解决方案。在两个公认协同感知基准上的实验表明,本方法相较依赖BEV地图的现有最优方法具有更优性能。