Region-Enhanced Feature Learning for Scene Semantic Segmentation

Semantic segmentation in complex scenes relies not only on object appearance but also on object location and the surrounding environment. Nonetheless, it is difficult to model long-range context in the format of pairwise point correlations due to the huge computational cost for large-scale point clouds. In this paper, we propose using regions as the intermediate representation of point clouds instead of fine-grained points or voxels to reduce the computational burden. We introduce a novel Region-Enhanced Feature Learning Network (REFL-Net) that leverages region correlations to enhance point feature learning. We design a region-based feature enhancement (RFE) module, which consists of a Semantic-Spatial Region Extraction stage and a Region Dependency Modeling stage. In the first stage, the input points are grouped into a set of regions based on their semantic and spatial proximity. In the second stage, we explore inter-region semantic and spatial relationships by employing a self-attention block on region features and then fuse point features with the region features to obtain more discriminative representations. Our proposed RFE module is plug-and-play and can be integrated with common semantic segmentation backbones. We conduct extensive experiments on ScanNetV2 and S3DIS datasets and evaluate our RFE module with different segmentation backbones. Our REFL-Net achieves 1.8% mIoU gain on ScanNetV2 and 1.7% mIoU gain on S3DIS with negligible computational cost compared with backbone models. Both quantitative and qualitative results show the powerful long-range context modeling ability and strong generalization ability of our REFL-Net.

翻译：复杂场景中的语义分割不仅依赖于物体外观，还依赖于物体位置及周围环境。然而，由于大规模点云中成对点关联的计算成本极高，难以高效建模长程上下文。本文提出以区域作为点云的中间表示，替代细粒度点或体素，以降低计算负担。我们引入一种新颖的区域增强特征学习网络（REFL-Net），利用区域关联来增强点特征学习。我们设计了一个基于区域的特征增强（RFE）模块，该模块包含语义-空间区域提取阶段和区域依赖建模阶段。在第一阶段，输入点根据其语义和空间邻近性被分组为若干区域；在第二阶段，我们通过对区域特征应用自注意力模块来探索区域间的语义与空间关系，随后将点特征与区域特征融合以获取更具判别性的表示。所提出的RFE模块具有即插即用特性，可集成至常见语义分割主干网络。我们在ScanNetV2和S3DIS数据集上进行了广泛实验，并采用不同分割主干网络评估RFE模块。与主干模型相比，REFL-Net在ScanNetV2上实现1.8%的平均交并比（mIoU）提升，在S3DIS上实现1.7%的提升，且计算开销可忽略不计。定量与定性结果均表明，我们的REFL-Net具有强大的长程上下文建模能力与强泛化能力。