Although remarkable advancements have been made recently in point cloud analysis through the exploration of transformer architecture, it remains challenging to effectively learn local and global structures within point clouds. In this paper, we propose a new transformer architecture equipped with a collect-and-distribute mechanism to communicate short- and long-range contexts of point clouds, which we refer to as CDFormer. Specifically, we first utilize self-attention to capture short-range interactions within each local patch, and the updated local features are then collected into a set of proxy reference points from which we can extract long-range contexts. Afterward, we distribute the learned long-range contexts back to local points via cross-attention. To address the position clues for short- and long-range contexts, we also introduce context-aware position encoding to facilitate position-aware communications between points. We perform experiments on four popular point cloud datasets, namely ModelNet40, ScanObjectNN, S3DIS, and ShapeNetPart, for classification and segmentation. Results show the effectiveness of the proposed CDFormer, delivering several new state-of-the-art performances on point cloud classification and segmentation tasks. The code is available at \url{https://github.com/haibo-qiu/CDFormer}.
翻译:尽管近年来通过探索Transformer架构在点云分析领域取得了显著进展,但有效学习点云中的局部和全局结构仍具挑战性。本文提出一种配备收集与分发机制的新型Transformer架构(称为CDFormer),用于传递点云的短程与长程上下文。具体而言,我们首先利用自注意力机制捕获每个局部块内的短程交互,随后将更新的局部特征收集到一组代理参考点中,从中提取长程上下文;接着通过交叉注意力将学习到的长程上下文分发回局部点。针对短程与长程上下文的位置线索,我们引入上下文感知位置编码,以促进点之间的位置感知通信。在ModelNet40、ScanObjectNN、S3DIS和ShapeNetPart四个主流点云数据集上进行了分类与分割实验,结果证明了所提CDFormer的有效性,其在点云分类与分割任务中实现了多项新的最优性能。代码已在https://github.com/haibo-qiu/CDFormer开源。