Query-based 3D scene instance segmentation from point clouds has attained notable performance. However, existing methods suffer from the query initialization dilemma due to the sparse nature of point clouds and rely on computationally intensive attention mechanisms in query decoders. We accordingly introduce LaSSM, prioritizing simplicity and efficiency while maintaining competitive performance. Specifically, we propose a hierarchical semantic-spatial query initializer to derive the query set from superpoints by considering both semantic cues and spatial distribution, achieving comprehensive scene coverage and accelerated convergence. We further present a coordinate-guided state space model (SSM) decoder that progressively refines queries. The novel decoder features a local aggregation scheme that restricts the model to focus on geometrically coherent regions and a spatial dual-path SSM block to capture underlying dependencies within the query set by integrating associated coordinates information. Our design enables efficient instance prediction, avoiding the incorporation of noisy information and reducing redundant computation. LaSSM ranks first place on the latest ScanNet++ V2 leaderboard, outperforming the previous best method by 2.5% mAP with only 1/3 FLOPs, demonstrating its superiority in challenging large-scale scene instance segmentation. LaSSM also achieves competitive performance on ScanNet, ScanNet200, S3DIS and ScanNet++ V1 benchmarks with less computational cost. Extensive ablation studies and qualitative results validate the effectiveness of our design. The code and weights are available at https://github.com/RayYoh/LaSSM.
翻译:基于查询的三维点云场景实例分割方法已取得显著性能。然而,现有方法因点云的稀疏特性而面临查询初始化困境,并依赖查询解码器中计算密集的注意力机制。为此,我们提出LaSSM,在保持竞争力的同时优先考虑简洁性与效率。具体而言,我们设计了一种层次化语义-空间查询初始化器,通过同时考虑语义线索与空间分布从超点推导查询集,实现了全面的场景覆盖并加速了收敛过程。我们进一步提出了一种坐标引导的状态空间模型解码器,用于逐步优化查询。该新型解码器采用局部聚合方案,将模型关注范围限制在几何连贯的区域,并通过空间双路径SSM模块整合关联坐标信息以捕捉查询集内的潜在依赖关系。我们的设计实现了高效的实例预测,避免了噪声信息的引入并减少了冗余计算。LaSSM在最新的ScanNet++ V2榜单上位列第一,以仅1/3的FLOPs计算量超越先前最佳方法2.5% mAP,证明了其在挑战性大规模场景实例分割任务中的优越性。同时,LaSSM在ScanNet、ScanNet200、S3DIS和ScanNet++ V1基准测试中以更低计算成本取得了具有竞争力的性能。大量消融实验与定性结果验证了我们设计的有效性。代码与权重已开源:https://github.com/RayYoh/LaSSM。