Contrastive learning has recently demonstrated great potential for unsupervised pre-training in 3D scene understanding tasks. However, most existing work randomly selects point features as anchors while building contrast, leading to a clear bias toward background points that often dominate in 3D scenes. Also, object awareness and foreground-to-background discrimination are neglected, making contrastive learning less effective. To tackle these issues, we propose a general foreground-aware feature contrast (FAC) framework to learn more effective point cloud representations in pre-training. FAC consists of two novel contrast designs to construct more effective and informative contrast pairs. The first is building positive pairs within the same foreground segment where points tend to have the same semantics. The second is that we prevent over-discrimination between 3D segments/objects and encourage foreground-to-background distinctions at the segment level with adaptive feature learning in a Siamese correspondence network, which adaptively learns feature correlations within and across point cloud views effectively. Visualization with point activation maps shows that our contrast pairs capture clear correspondences among foreground regions during pre-training. Quantitative experiments also show that FAC achieves superior knowledge transfer and data efficiency in various downstream 3D semantic segmentation and object detection tasks.
翻译:对比学习近期在三维场景理解任务的无监督预训练中展现出巨大潜力。然而,现有工作大多随机选取点特征作为构建对比的锚点,导致明显偏向于三维场景中占主导的背景点。同时,物体感知和前景-背景区分能力被忽视,使得对比学习效果欠佳。为解决这些问题,我们提出通用前景感知特征对比(FAC)框架,用于在预训练中学习更有效的点云表示。FAC包含两种新颖的对比设计,以构建更有效且信息丰富的对比对:其一是在同一前景片段内构建正样本对(该片段中点倾向于具有相同语义);其二是通过孪生对应网络中的自适应特征学习,防止三维片段/物体间的过度区分,同时在片段层面鼓励前景-背景区分——该网络可自适应地学习点云视图内及视图间的特征关联。点激活图可视化表明,我们的对比对在预训练中能捕获前景区域间的清晰对应关系。定量实验也显示,FAC在各类下游三维语义分割与物体检测任务中展现出优越的知识迁移与数据效率。