In this paper we delve into the properties of transformers, attained through self-supervision, in the point cloud domain. Specifically, we evaluate the effectiveness of Masked Autoencoding as a pretraining scheme, and explore Momentum Contrast as an alternative. In our study we investigate the impact of data quantity on the learned features, and uncover similarities in the transformer's behavior across domains. Through comprehensive visualiations, we observe that the transformer learns to attend to semantically meaningful regions, indicating that pretraining leads to a better understanding of the underlying geometry. Moreover, we examine the finetuning process and its effect on the learned representations. Based on that, we devise an unfreezing strategy which consistently outperforms our baseline without introducing any other modifications to the model or the training pipeline, and achieve state-of-the-art results in the classification task among transformer models.
翻译:本文深入探讨了通过自监督学习得到的Transformer在点云领域的特性。具体而言,我们评估了掩码自编码作为预训练方案的有效性,并探索了动量对比作为一种替代方案。在我们的研究中,我们考察了数据量对学习特征的影响,并揭示了Transformer在不同领域间行为的相似性。通过全面的可视化,我们观察到Transformer学会关注具有语义意义的区域,表明预训练有助于更好地理解底层几何结构。此外,我们检查了微调过程及其对学习表示的影响。基于此,我们设计了一种解冻策略,该策略在不引入模型或训练流程的其他修改的情况下持续优于基线,并在Transformer模型中的分类任务上取得了最先进的结果。