Recent advancements in self-supervised learning in the point cloud domain have demonstrated significant potential. However, these methods often suffer from drawbacks, including lengthy pre-training time, the necessity of reconstruction in the input space, or the necessity of additional modalities. In order to address these issues, we introduce Point-JEPA, a joint embedding predictive architecture designed specifically for point cloud data. To this end, we introduce a sequencer that orders point cloud tokens to efficiently compute and utilize tokens proximity based on their indices during target and context selection. The sequencer also allows shared computations of the tokens proximity between context and target selection, further improving the efficiency. Experimentally, our method achieves competitive results with state-of-the-art methods while avoiding the reconstruction in the input space or additional modality.
翻译:摘要:近年来,点云领域的自监督学习取得了显著进展,但现有方法通常存在预训练时间长、依赖输入空间重建或需要额外模态等局限。为解决这些问题,我们提出Point-JEPA——一种专为点云数据设计的联合嵌入预测架构。为此,我们引入序列生成器,通过对点云令牌进行排序,在目标和上下文选择过程中基于令牌索引高效计算并利用其邻近性。该序列生成器还支持上下文与目标选择之间的令牌邻近性共享计算,进一步提升效率。实验表明,本方法在避免输入空间重建或额外模态需求的同时,取得了与现有最优方法相竞争的性能。