Recently Transformer-based models have advanced point cloud understanding by leveraging self-attention mechanisms, however, these methods often overlook latent information in less prominent regions, leading to increased sensitivity to perturbations and limited global comprehension. To solve this issue, we introduce PointACL, an attention-driven contrastive learning framework designed to address these limitations. Our method employs an attention-driven dynamic masking strategy that guides the model to focus on under-attended regions, enhancing the understanding of global structures within the point cloud. Then we combine the original pre-training loss with a contrastive learning loss, improving feature discrimination and generalization. Extensive experiments validate the effectiveness of PointACL, as it achieves state-of-the-art performance across a variety of 3D understanding tasks, including object classification, part segmentation, and few-shot learning. Specifically, when integrated with different Transformer backbones like Point-MAE and PointGPT, PointACL demonstrates improved performance on datasets such as ScanObjectNN, ModelNet40, and ShapeNetPart. This highlights its superior capability in capturing both global and local features, as well as its enhanced robustness against perturbations and incomplete data.
翻译:近年来,基于Transformer的模型通过利用自注意力机制推动了点云理解的发展,然而这些方法往往忽视了非显著区域的潜在信息,导致对扰动的敏感性增加以及全局理解能力受限。为解决这一问题,我们提出了PointACL,一种旨在克服这些局限性的注意力驱动对比学习框架。我们的方法采用注意力驱动的动态掩码策略,引导模型关注低注意力区域,从而增强对点云全局结构的理解。随后,我们将原始预训练损失与对比学习损失相结合,以提升特征判别力与泛化能力。大量实验验证了PointACL的有效性,其在多种三维理解任务中均取得了最先进的性能,包括物体分类、部件分割和少样本学习。具体而言,当与Point-MAE和PointGPT等不同Transformer骨干网络结合时,PointACL在ScanObjectNN、ModelNet40和ShapeNetPart等数据集上均表现出性能提升。这突显了其在捕获全局与局部特征方面的卓越能力,以及对扰动和不完整数据增强的鲁棒性。