Transformer-based networks have achieved impressive performance in 3D point cloud understanding. However, most of them concentrate on aggregating local features, but neglect to directly model global dependencies, which results in a limited effective receptive field. Besides, how to effectively incorporate local and global components also remains challenging. To tackle these problems, we propose Asymmetric Parallel Point Transformer (APPT). Specifically, we introduce Global Pivot Attention to extract global features and enlarge the effective receptive field. Moreover, we design the Asymmetric Parallel structure to effectively integrate local and global information. Combined with these designs, APPT is able to capture features globally throughout the entire network while focusing on local-detailed features. Extensive experiments show that our method outperforms the priors and achieves state-of-the-art on several benchmarks for 3D point cloud understanding, such as 3D semantic segmentation on S3DIS, 3D shape classification on ModelNet40, and 3D part segmentation on ShapeNet.
翻译:基于Transformer的网络在三维点云理解任务中已取得显著性能。然而,多数方法集中于局部特征聚合,忽略了全局依赖关系的直接建模,导致有效感受野受限。此外,如何有效融合局部与全局成分仍具挑战性。为解决这些问题,我们提出非对称并行点Transformer(APPT)。具体而言,我们引入全局枢轴注意力机制(Global Pivot Attention)以提取全局特征并扩大有效感受野。同时,我们设计非对称并行结构(Asymmetric Parallel structure)来有效整合局部与全局信息。结合这些设计,APPT能够在整个网络中实现全局特征捕获,同时关注局部细节特征。大量实验表明,在三维点云理解的多个基准任务(如S3DIS数据集上的三维语义分割、ModelNet40数据集上的三维形状分类、ShapeNet数据集上的三维部件分割)中,本方法优于现有模型并达到最先进性能。