Contrastive learning has recently demonstrated great potential for unsupervised pre-training in 3D scene understanding tasks. However, most existing work randomly selects point features as anchors while building contrast, leading to a clear bias toward background points that often dominate in 3D scenes. Also, object awareness and foreground-to-background discrimination are neglected, making contrastive learning less effective. To tackle these issues, we propose a general foreground-aware feature contrast (FAC) framework to learn more effective point cloud representations in pre-training. FAC consists of two novel contrast designs to construct more effective and informative contrast pairs. The first is building positive pairs within the same foreground segment where points tend to have the same semantics. The second is that we prevent over-discrimination between 3D segments/objects and encourage foreground-to-background distinctions at the segment level with adaptive feature learning in a Siamese correspondence network, which adaptively learns feature correlations within and across point cloud views effectively. Visualization with point activation maps shows that our contrast pairs capture clear correspondences among foreground regions during pre-training. Quantitative experiments also show that FAC achieves superior knowledge transfer and data efficiency in various downstream 3D semantic segmentation and object detection tasks.
翻译:对比学习近期在三维场景理解任务的无监督预训练中展现出巨大潜力。然而,现有工作大多在构建对比时随机选取点特征作为锚点,导致显著偏向于三维场景中占据主导的背景点。同时,物体感知与前景-背景判别能力的缺失降低了对比学习的有效性。针对这些问题,我们提出一种通用的前景感知特征对比(FAC)框架,用于在预训练中学习更有效的点云表征。FAC包含两种新型对比设计以构建更有效且信息丰富的对比对:其一,在前景片段内构建正样本对,其点特征往往具有相同语义;其二,通过孪生对应网络中的自适应特征学习,防止三维片段/物体间的过度区分,同时在片段层面强化前景-背景判别能力。该网络能有效自适应学习点云视图内部及跨视图的特征关联。基于点激活图的可视化表明,我们的对比对在预训练中能清晰捕捉前景区域间的对应关系。定量实验也证明,FAC在多种下游三维语义分割与物体检测任务中实现了优越的知识迁移与数据效率。