Recently, the self-supervised learning framework data2vec has shown inspiring performance for various modalities using a masked student-teacher approach. However, it remains open whether such a framework generalizes to the unique challenges of 3D point clouds. To answer this question, we extend data2vec to the point cloud domain and report encouraging results on several downstream tasks. In an in-depth analysis, we discover that the leakage of positional information reveals the overall object shape to the student even under heavy masking and thus hampers data2vec to learn strong representations for point clouds. We address this 3D-specific shortcoming by proposing point2vec, which unleashes the full potential of data2vec-like pre-training on point clouds. Our experiments show that point2vec outperforms other self-supervised methods on shape classification and few-shot learning on ModelNet40 and ScanObjectNN, while achieving competitive results on part segmentation on ShapeNetParts. These results suggest that the learned representations are strong and transferable, highlighting point2vec as a promising direction for self-supervised learning of point cloud representations.
翻译:近期,自监督学习框架data2vec通过掩码师生学习方法在多种模态上展现出令人瞩目的性能。然而,该框架能否推广至具有独特挑战的三维点云领域仍悬而未决。为回答此问题,我们将data2vec扩展至点云领域,并在多项下游任务中报告了令人鼓舞的结果。通过深入分析,我们发现即使在强掩码条件下,位置信息的泄露也会向学生网络暴露物体整体形状,从而阻碍data2vec学习点云的强表征。针对这一三维领域特有缺陷,我们提出point2vec方法,充分释放了类data2vec预训练在点云上的潜力。实验表明,point2vec在ModelNet40和ScanObjectNN上的形状分类与少样本学习任务中优于其他自监督方法,同时在ShapeNetParts的部件分割任务中取得有竞争力的结果。这些结果表明,所学表征具有良好的鲁棒性与迁移性,凸显point2vec作为点云表征自监督学习的一种有前途的研究方向。