Recently, the self-supervised learning framework data2vec has shown inspiring performance for various modalities using a masked student-teacher approach. However, it remains open whether such a framework generalizes to the unique challenges of 3D point clouds. To answer this question, we extend data2vec to the point cloud domain and report encouraging results on several downstream tasks. In an in-depth analysis, we discover that the leakage of positional information reveals the overall object shape to the student even under heavy masking and thus hampers data2vec to learn strong representations for point clouds. We address this 3D-specific shortcoming by proposing point2vec, which unleashes the full potential of data2vec-like pre-training on point clouds. Our experiments show that point2vec outperforms other self-supervised methods on shape classification and few-shot learning on ModelNet40 and ScanObjectNN, while achieving competitive results on part segmentation on ShapeNetParts. These results suggest that the learned representations are strong and transferable, highlighting point2vec as a promising direction for self-supervised learning of point cloud representations.
翻译:最近,自监督学习框架data2vec通过掩码学生-教师方法在多种模态上展现了令人鼓舞的性能。然而,该框架能否泛化至三维点云的独特挑战仍是一个开放问题。为回答这一问题,我们将data2vec扩展至点云领域,并在多个下游任务中报告了令人振奋的结果。通过深入分析,我们发现即使在强掩码条件下,位置信息的泄漏仍会将整体物体形状暴露给学生模型,从而阻碍data2vec学习点云的强表示。针对这一三维特有的缺陷,我们提出Point2Vec方法,充分释放了类data2vec预训练在点云上的潜力。实验表明,Point2Vec在ModelNet40和ScanObjectNN上的形状分类与小样本学习任务中优于其他自监督方法,同时在ShapeNetParts的部件分割上取得具有竞争力的结果。这些结果表明所学表示具有强迁移性,凸显了Point2Vec作为点云表示自监督学习的一个有前景方向。