3D point cloud semantic segmentation has a wide range of applications. Recently, weakly supervised point cloud segmentation methods have been proposed, aiming to alleviate the expensive and laborious manual annotation process by leveraging scene-level labels. However, these methods have not effectively exploited the rich geometric information (such as shape and scale) and appearance information (such as color and texture) present in RGB-D scans. Furthermore, current approaches fail to fully leverage the point affinity that can be inferred from the feature extraction network, which is crucial for learning from weak scene-level labels. Additionally, previous work overlooks the detrimental effects of the long-tailed distribution of point cloud data in weakly supervised 3D semantic segmentation. To this end, this paper proposes a simple yet effective scene-level weakly supervised point cloud segmentation method with a newly introduced multi-modality point affinity inference module. The point affinity proposed in this paper is characterized by features from multiple modalities (e.g., point cloud and RGB), and is further refined by normalizing the classifier weights to alleviate the detrimental effects of long-tailed distribution without the need of the prior of category distribution. Extensive experiments on the ScanNet and S3DIS benchmarks verify the effectiveness of our proposed method, which outperforms the state-of-the-art by ~4% to ~6% mIoU. Codes are released at https://github.com/Sunny599/AAAI24-3DWSSG-MMA.
翻译:三维点云语义分割具有广泛的应用前景。近年来,为缓解昂贵且费时的人工标注过程,研究者提出了利用场景级标签的弱监督点云分割方法。然而,这些方法未能有效利用RGB-D扫描中蕴含的丰富几何信息(如形状和尺度)与外观信息(如颜色和纹理)。此外,现有方法未能充分挖掘可从特征提取网络推断出的点亲和关系,而这对从弱场景级标签中学习至关重要。同时,先前工作忽视了弱监督三维语义分割中点云数据长尾分布带来的负面影响。为此,本文提出一种简洁有效的场景级弱监督点云分割方法,并引入新颖的多模态点亲和推理模块。本文所提出的点亲和关系通过多模态(如点云与RGB)特征进行表征,进一步通过归一化分类器权重进行优化,以缓解长尾分布带来的负面影响,且无需预知类别分布先验。在ScanNet和S3DIS基准上的大量实验验证了所提方法的有效性,其平均交并比(mIoU)相较于当前最优方法提升约4%至6%。代码已发布于https://github.com/Sunny599/AAAI24-3DWSSG-MMA。