Self-supervised 3D representation learning aims to learn effective representations from large-scale unlabeled point clouds. Most existing approaches adopt point discrimination as the pretext task, which assigns matched points in two distinct views as positive pairs and unmatched points as negative pairs. However, this approach often results in semantically identical points having dissimilar representations, leading to a high number of false negatives and introducing a "semantic conflict" problem. To address this issue, we propose GroupContrast, a novel approach that combines segment grouping and semantic-aware contrastive learning. Segment grouping partitions points into semantically meaningful regions, which enhances semantic coherence and provides semantic guidance for the subsequent contrastive representation learning. Semantic-aware contrastive learning augments the semantic information extracted from segment grouping and helps to alleviate the issue of "semantic conflict". We conducted extensive experiments on multiple 3D scene understanding tasks. The results demonstrate that GroupContrast learns semantically meaningful representations and achieves promising transfer learning performance.
翻译:自监督三维表示学习旨在从大规模无标注点云中学习有效的表示。现有方法大多采用点判别作为预训练任务,将两个不同视角中的匹配点分配为正样本对,非匹配点分配为负样本对。然而,这种方法常导致语义相同的点具有不同的表示,从而产生大量假负样本,引发"语义冲突"问题。为解决这一问题,我们提出了GroupContrast——一种结合片段分组与语义感知对比学习的新方法。片段分组将点划分为具有语义意义的区域,增强了语义一致性,并为后续的对比表示学习提供语义引导。语义感知对比学习能够增强从片段分组中提取的语义信息,并有助于缓解"语义冲突"问题。我们在多个三维场景理解任务上进行了大量实验。结果表明,GroupContrast能够学习到具有语义意义的表示,并取得了优秀的迁移学习性能。