Recently, there have been some attempts of Transformer in 3D point cloud classification. In order to reduce computations, most existing methods focus on local spatial attention, but ignore their content and fail to establish relationships between distant but relevant points. To overcome the limitation of local spatial attention, we propose a point content-based Transformer architecture, called PointConT for short. It exploits the locality of points in the feature space (content-based), which clusters the sampled points with similar features into the same class and computes the self-attention within each class, thus enabling an effective trade-off between capturing long-range dependencies and computational complexity. We further introduce an Inception feature aggregator for point cloud classification, which uses parallel structures to aggregate high-frequency and low-frequency information in each branch separately. Extensive experiments show that our PointConT model achieves a remarkable performance on point cloud shape classification. Especially, our method exhibits 90.3% Top-1 accuracy on the hardest setting of ScanObjectNN. Source code of this paper is available at https://github.com/yahuiliu99/PointConT.
翻译:近年来,Transformer架构在三维点云分类任务中已有初步探索。为降低计算复杂度,现有方法多聚焦于局部空间注意力机制,但忽视了点云的内容特征,无法建立远距离相关点之间的联系。为突破局部空间注意力的局限性,本文提出了一种基于点内容感知的Transformer架构——PointConT。该架构利用特征空间中点的局部性(基于内容),将具有相似特征的采样点聚类至同一类别,并在各类别内计算自注意力,从而有效平衡长距离依赖关系捕获与计算复杂度。我们进一步引入面向点云分类的Inception特征聚合器,通过并行结构在各分支分别聚合高频与低频信息。大量实验表明,我们的PointConT模型在点云形状分类任务中取得了卓越性能。特别地,在ScanObjectNN最难配置下,该方法实现了90.3%的Top-1准确率。本文源代码已发布于https://github.com/yahuiliu99/PointConT。