Multi-agent collaborative perception as a potential application for vehicle-to-everything communication could significantly improve the perception performance of autonomous vehicles over single-agent perception. However, several challenges remain in achieving pragmatic information sharing in this emerging research. In this paper, we propose SCOPE, a novel collaborative perception framework that aggregates the spatio-temporal awareness characteristics across on-road agents in an end-to-end manner. Specifically, SCOPE has three distinct strengths: i) it considers effective semantic cues of the temporal context to enhance current representations of the target agent; ii) it aggregates perceptually critical spatial information from heterogeneous agents and overcomes localization errors via multi-scale feature interactions; iii) it integrates multi-source representations of the target agent based on their complementary contributions by an adaptive fusion paradigm. To thoroughly evaluate SCOPE, we consider both real-world and simulated scenarios of collaborative 3D object detection tasks on three datasets. Extensive experiments demonstrate the superiority of our approach and the necessity of the proposed components.
翻译:多智能体协同感知作为车联万物通信的潜在应用,能够显著提升自动驾驶汽车相较于单智能体感知的感知性能。然而,在这一新兴研究中,实现高效信息共享仍面临若干挑战。本文提出SCOPE——一种新型协同感知框架,以端到端方式聚合路侧智能体的时空感知特征。具体而言,SCOPE具有三大独特优势:i) 利用时间上下文的有效语义线索增强目标智能体的当前表征;ii) 通过多尺度特征交互聚合异构智能体的感知关键空间信息并克服定位误差;iii) 采用自适应融合范式,基于互补贡献整合目标智能体的多源表征。为全面评估SCOPE,我们分别在三个数据集上考虑了协同三维目标检测任务的真实场景与仿真场景。大量实验证明了本方法的优越性及所提组件的必要性。