Mamba24/8D: Enhancing Global Interaction in Point Clouds via State Space Model

Transformers have demonstrated impressive results for 3D point cloud semantic segmentation. However, the quadratic complexity of transformer makes computation cost high, limiting the number of points that can be processed simultaneously and impeding the modeling of long-range dependencies. Drawing inspiration from the great potential of recent state space models (SSM) for long sequence modeling, we introduce Mamba, a SSM-based architecture, to the point cloud domain and propose Mamba24/8D, which has strong global modeling capability under linear complexity. Specifically, to make disorderness of point clouds fit in with the causal nature of Mamba, we propose a multi-path serialization strategy applicable to point clouds. Besides, we propose the ConvMamba block to compensate for the shortcomings of Mamba in modeling local geometries and in unidirectional modeling. Mamba24/8D obtains state of the art results on several 3D point cloud segmentation tasks, including ScanNet v2, ScanNet200 and nuScenes, while its effectiveness is validated by extensive experiments.

翻译：Transformer 在三维点云语义分割任务中已展现出卓越的性能。然而，Transformer 的二次计算复杂度导致其计算成本高昂，限制了可同时处理的点数，并阻碍了长距离依赖关系的建模。受近期状态空间模型（SSM）在长序列建模方面巨大潜力的启发，我们将基于 SSM 的架构 Mamba 引入点云领域，并提出了 Mamba24/8D。该模型在线性复杂度下具备强大的全局建模能力。具体而言，为使点云的无序性适应 Mamba 的因果特性，我们提出了一种适用于点云的多路径序列化策略。此外，我们提出了 ConvMamba 模块，以弥补 Mamba 在建模局部几何特征和单向建模方面的不足。Mamba24/8D 在多个三维点云分割任务（包括 ScanNet v2、ScanNet200 和 nuScenes）上取得了最先进的结果，其有效性已通过大量实验得到验证。

相关内容

点云

关注 50

根据激光测量原理得到的点云，包括三维坐标（XYZ）和激光反射强度（Intensity）。根据摄影测量原理得到的点云，包括三维坐标（XYZ）和颜色信息（RGB）。结合激光测量和摄影测量原理得到点云，包括三维坐标（XYZ）、激光反射强度（Intensity）和颜色信息（RGB）。在获取物体表面每个采样点的空间坐标后，得到的是一个点的集合，称之为“点云”(Point Cloud)

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日