GTNet: Graph Transformer Network for 3D Point Cloud Classification and Semantic Segmentation

Recently, graph-based and Transformer-based deep learning networks have demonstrated excellent performances on various point cloud tasks. Most of the existing graph methods are based on static graph, which take a fixed input to establish graph relations. Moreover, many graph methods apply maximization and averaging to aggregate neighboring features, so that only a single neighboring point affects the feature of centroid or different neighboring points have the same influence on the centroid's feature, which ignoring the correlation and difference between points. Most Transformer-based methods extract point cloud features based on global attention and lack the feature learning on local neighbors. To solve the problems of these two types of models, we propose a new feature extraction block named Graph Transformer and construct a 3D point point cloud learning network called GTNet to learn features of point clouds on local and global patterns. Graph Transformer integrates the advantages of graph-based and Transformer-based methods, and consists of Local Transformer and Global Transformer modules. Local Transformer uses a dynamic graph to calculate all neighboring point weights by intra-domain cross-attention with dynamically updated graph relations, so that every neighboring point could affect the features of centroid with different weights; Global Transformer enlarges the receptive field of Local Transformer by a global self-attention. In addition, to avoid the disappearance of the gradient caused by the increasing depth of network, we conduct residual connection for centroid features in GTNet; we also adopt the features of centroid and neighbors to generate the local geometric descriptors in Local Transformer to strengthen the local information learning capability of the model. Finally, we use GTNet for shape classification, part segmentation and semantic segmentation tasks in this paper.

翻译：近期，基于图与基于变换器的深度学习网络在各类点云任务中展现出卓越性能。现有图方法多采用静态图结构，通过固定输入建立图关系。此外，多数图方法采用最大化与平均化操作聚合邻域特征，导致仅单个邻域点影响中心点特征，或不同邻域点对中心点特征具有相同影响力，从而忽略了点间的相关性与差异性。而基于变换器的方法主要依赖全局注意力提取点云特征，缺乏对局部邻域的特征学习能力。为解决这两类模型的问题，我们提出新型特征提取模块——图变换器，并构建三维点云学习网络GTNet，实现点云局部与全局模式的特征学习。图变换器融合了图方法与变换器方法的优势，由局部变换器与全局变换器模块组成。局部变换器通过动态图构建动态更新的图关系，并采用域内交叉注意力计算所有邻域点的权重，使每个邻域点能以差异化权重影响中心点特征；全局变换器则通过全局自注意力机制扩大局部变换器的感受野。此外，为避免网络深度增加导致的梯度消失，我们在GTNet中对中心点特征进行残差连接；同时在局部变换器中采用中心点与邻域点特征生成局部几何描述符，以增强模型局部信息学习能力。本文最终将GTNet应用于形状分类、部件分割及语义分割任务。