Point cloud registration is a fundamental task in the fields of computer vision and robotics. Recent developments in transformer-based methods have demonstrated enhanced performance in this domain. However, the standard attention mechanism utilized in these methods often integrates many low-relevance points, thereby struggling to prioritize its attention weights on sparse yet meaningful points. This inefficiency leads to limited local structure modeling capabilities and quadratic computational complexity. To overcome these limitations, we propose the Point Tree Transformer (PTT), a novel transformer-based approach for point cloud registration that efficiently extracts comprehensive local and global features while maintaining linear computational complexity. The PTT constructs hierarchical feature trees from point clouds in a coarse-to-dense manner, and introduces a novel Point Tree Attention (PTA) mechanism, which follows the tree structure to facilitate the progressive convergence of attended regions towards salient points. Specifically, each tree layer selectively identifies a subset of key points with the highest attention scores. Subsequent layers focus attention on areas of significant relevance, derived from the child points of the selected point set. The feature extraction process additionally incorporates coarse point features that capture high-level semantic information, thus facilitating local structure modeling and the progressive integration of multiscale information. Consequently, PTA empowers the model to concentrate on crucial local structures and derive detailed local information while maintaining linear computational complexity. Extensive experiments conducted on the 3DMatch, ModelNet40, and KITTI datasets demonstrate that our method achieves superior performance over the state-of-the-art methods.
翻译:点云配准是计算机视觉与机器人领域的一项基础任务。基于Transformer的方法近期发展在该领域展现出增强的性能表现。然而,这些方法采用的标准注意力机制通常会整合大量低关联性点云数据,因而难以将注意力权重聚焦于稀疏却具有意义的点云上。这种低效性导致局部结构建模能力受限及二次计算复杂度。为克服这些局限,我们提出点树Transformer(PTT)——一种用于点云配准的新型基于Transformer的方法,该方法能高效提取全面的局部与全局特征,同时保持线性计算复杂度。PTT以从粗到细的方式从点云构建层次化特征树,并引入新颖的点树注意力(PTA)机制,该机制遵循树形结构以促进注意力区域逐步收敛至显著点。具体而言,每个树层会选择性识别具有最高注意力得分的子集关键点。后续层将注意力聚焦于具有显著相关性的区域,这些区域源自所选点集的子节点。特征提取过程还融合了捕获高层语义信息的粗粒度点特征,从而促进局部结构建模及多尺度信息的渐进整合。因此,PTA使模型能够专注于关键局部结构并获取详细局部信息,同时保持线性计算复杂度。在3DMatch、ModelNet40和KITTI数据集上进行的大量实验表明,我们的方法实现了超越现有最优方法的卓越性能。