3D Referring Expression Segmentation (3D-RES) aims to segment 3D objects by correlating referring expressions with point clouds. However, traditional approaches frequently encounter issues like over-segmentation or mis-segmentation, due to insufficient emphasis on spatial information of instances. In this paper, we introduce a Rule-Guided Spatial Awareness Network (RG-SAN) by utilizing solely the spatial information of the target instance for supervision. This approach enables the network to accurately depict the spatial relationships among all entities described in the text, thus enhancing the reasoning capabilities. The RG-SAN consists of the Text-driven Localization Module (TLM) and the Rule-guided Weak Supervision (RWS) strategy. The TLM initially locates all mentioned instances and iteratively refines their positional information. The RWS strategy, acknowledging that only target objects have supervised positional information, employs dependency tree rules to precisely guide the core instance's positioning. Extensive testing on the ScanRefer benchmark has shown that RG-SAN not only establishes new performance benchmarks, with an mIoU increase of 5.1 points, but also exhibits significant improvements in robustness when processing descriptions with spatial ambiguity. All codes are available at https://github.com/sosppxo/RG-SAN.
翻译:三维指代表达分割(3D-RES)旨在通过将指代表达与点云相关联来分割三维物体。然而,传统方法由于对实例空间信息的重视不足,经常遇到过分割或误分割的问题。本文提出一种规则引导的空间感知网络(RG-SAN),该方法仅利用目标实例的空间信息进行监督。该策略使网络能够精确刻画文本描述中所有实体的空间关系,从而增强推理能力。RG-SAN由文本驱动定位模块(TLM)和规则引导弱监督(RWS)策略构成。TLM首先定位所有提及的实例,并迭代优化其位置信息。RWS策略考虑到仅有目标对象具备监督位置信息,采用依存句法树规则来精确引导核心实例的定位。在ScanRefer基准上的大量测试表明,RG-SAN不仅建立了新的性能基准(mIoU提升5.1个百分点),而且在处理具有空间模糊性的描述时,其鲁棒性也展现出显著提升。所有代码均公开于 https://github.com/sosppxo/RG-SAN。