Multi-agent collaborative perception as a potential application for vehicle-to-everything communication could significantly improve the perception performance of autonomous vehicles over single-agent perception. However, several challenges remain in achieving pragmatic information sharing in this emerging research. In this paper, we propose SCOPE, a novel collaborative perception framework that aggregates the spatio-temporal awareness characteristics across on-road agents in an end-to-end manner. Specifically, SCOPE has three distinct strengths: i) it considers effective semantic cues of the temporal context to enhance current representations of the target agent; ii) it aggregates perceptually critical spatial information from heterogeneous agents and overcomes localization errors via multi-scale feature interactions; iii) it integrates multi-source representations of the target agent based on their complementary contributions by an adaptive fusion paradigm. To thoroughly evaluate SCOPE, we consider both real-world and simulated scenarios of collaborative 3D object detection tasks on three datasets. Extensive experiments demonstrate the superiority of our approach and the necessity of the proposed components.
翻译:多智能体协同感知作为车联网通信的潜在应用,可显著提升自动驾驶系统相较于单智能体感知的效能。然而,在该新兴研究领域中实现高效信息共享仍面临诸多挑战。本文提出SCOPE——一种以端到端方式聚合道路智能体时空感知特性的新型协同感知框架。具体而言,SCOPE具有三大显著优势:i) 有效利用时序上下文的语义线索增强目标智能体的当前表征;ii) 通过多尺度特征交互,聚合异构智能体感知关键的空间信息并克服定位误差;iii) 采用自适应融合范式,基于互补贡献度整合目标智能体的多源表征。为全面评估SCOPE,我们在三个数据集上同时考虑了协同三维目标检测任务的真实场景与仿真场景。大量实验证明了本方法的优越性及所提组件的必要性。