Transformers have demonstrated impressive results for 3D point cloud semantic segmentation. However, the quadratic complexity of transformer makes computation cost high, limiting the number of points that can be processed simultaneously and impeding the modeling of long-range dependencies. Drawing inspiration from the great potential of recent state space models (SSM) for long sequence modeling, we introduce Mamba, a SSM-based architecture, to the point cloud domain and propose Mamba24/8D, which has strong global modeling capability under linear complexity. Specifically, to make disorderness of point clouds fit in with the causal nature of Mamba, we propose a multi-path serialization strategy applicable to point clouds. Besides, we propose the ConvMamba block to compensate for the shortcomings of Mamba in modeling local geometries and in unidirectional modeling. Mamba24/8D obtains state of the art results on several 3D point cloud segmentation tasks, including ScanNet v2, ScanNet200 and nuScenes, while its effectiveness is validated by extensive experiments.
翻译:Transformer 在三维点云语义分割任务中已展现出卓越的性能。然而,Transformer 的二次计算复杂度导致其计算成本高昂,限制了可同时处理的点数,并阻碍了长距离依赖关系的建模。受近期状态空间模型(SSM)在长序列建模方面巨大潜力的启发,我们将基于 SSM 的架构 Mamba 引入点云领域,并提出了 Mamba24/8D。该模型在线性复杂度下具备强大的全局建模能力。具体而言,为使点云的无序性适应 Mamba 的因果特性,我们提出了一种适用于点云的多路径序列化策略。此外,我们提出了 ConvMamba 模块,以弥补 Mamba 在建模局部几何特征和单向建模方面的不足。Mamba24/8D 在多个三维点云分割任务(包括 ScanNet v2、ScanNet200 和 nuScenes)上取得了最先进的结果,其有效性已通过大量实验得到验证。