CollaMamba: Efficient Collaborative Perception with Cross-Agent Spatial-Temporal State Space Model

By sharing complementary perceptual information, multi-agent collaborative perception fosters a deeper understanding of the environment. Recent studies on collaborative perception mostly utilize CNNs or Transformers to learn feature representation and fusion in the spatial dimension, which struggle to handle long-range spatial-temporal features under limited computing and communication resources. Holistically modeling the dependencies over extensive spatial areas and extended temporal frames is crucial to enhancing feature quality. To this end, we propose a resource efficient cross-agent spatial-temporal collaborative state space model (SSM), named CollaMamba. Initially, we construct a foundational backbone network based on spatial SSM. This backbone adeptly captures positional causal dependencies from both single-agent and cross-agent views, yielding compact and comprehensive intermediate features while maintaining linear complexity. Furthermore, we devise a history-aware feature boosting module based on temporal SSM, extracting contextual cues from extended historical frames to refine vague features while preserving low overhead. Extensive experiments across several datasets demonstrate that CollaMamba outperforms state-of-the-art methods, achieving higher model accuracy while reducing computational and communication overhead by up to 71.9% and 1/64, respectively. This work pioneers the exploration of the Mamba's potential in collaborative perception. The source code will be made available.

翻译：通过共享互补的感知信息，多智能体协同感知能够促进对环境更深入的理解。现有的协同感知研究大多利用CNN或Transformer来学习空间维度的特征表示与融合，这些方法在有限的计算和通信资源下难以处理长程时空特征。对广阔空间区域与连续时间帧的依赖关系进行整体建模，对于提升特征质量至关重要。为此，我们提出一种资源高效的跨智能体时空协同状态空间模型（SSM），命名为CollaMamba。首先，我们基于空间SSM构建了一个基础骨干网络。该骨干网络能够从单智能体与跨智能体视角有效捕捉位置因果依赖，在保持线性复杂度的同时生成紧凑而全面的中间特征。进一步，我们设计了一个基于时序SSM的历史感知特征增强模块，从扩展的历史帧中提取上下文线索以优化模糊特征，同时维持较低的开销。在多个数据集上的大量实验表明，CollaMamba优于现有最先进方法，在实现更高模型精度的同时，分别将计算与通信开销降低了最高达71.9%和1/64。本工作开创性地探索了Mamba在协同感知中的潜力。源代码将公开提供。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日