Current 3D object detection methods for indoor scenes mainly follow the voting-and-grouping strategy to generate proposals. However, most methods utilize instance-agnostic groupings, such as ball query, leading to inconsistent semantic information and inaccurate regression of the proposals. To this end, we propose a novel superpoint grouping network for indoor anchor-free one-stage 3D object detection. Specifically, we first adopt an unsupervised manner to partition raw point clouds into superpoints, areas with semantic consistency and spatial similarity. Then, we design a geometry-aware voting module that adapts to the centerness in anchor-free detection by constraining the spatial relationship between superpoints and object centers. Next, we present a superpoint-based grouping module to explore the consistent representation within proposals. This module includes a superpoint attention layer to learn feature interaction between neighboring superpoints, and a superpoint-voxel fusion layer to propagate the superpoint-level information to the voxel level. Finally, we employ effective multiple matching to capitalize on the dynamic receptive fields of proposals based on superpoints during the training. Experimental results demonstrate our method achieves state-of-the-art performance on ScanNet V2, SUN RGB-D, and S3DIS datasets in the indoor one-stage 3D object detection. Source code is available at https://github.com/zyrant/SPGroup3D.
翻译:当前室内场景的三维目标检测方法主要采用投票与分组策略生成候选框。然而,多数方法使用实例无关的分组方式(如球查询),导致候选框的语义信息不一致且回归精度不足。为此,本文提出一种面向室内无锚框单阶段三维目标检测的新型超点分组网络。具体而言,首先采用无监督方式将原始点云划分为具有语义一致性与空间相似性的超点区域;随后设计几何感知投票模块,通过约束超点与目标中心的空间关系,自适应无锚框检测中的中心度度量;接着提出基于超点的分组模块,探索候选框内的特征一致性表示。该模块包含超点注意力层(用于学习相邻超点间的特征交互)与超点-体素融合层(用于将超点级信息传播至体素级)。最后,在训练阶段基于超点提出有效多重匹配策略,充分利用候选框的动态感受野。实验结果表明,该方法在ScanNet V2、SUN RGB-D及S3DIS数据集上均达到室内单阶段三维目标检测的最优性能。源代码已开源:https://github.com/zyrant/SPGroup3D。