Self-attention modules have demonstrated remarkable capabilities in capturing long-range relationships and improving the performance of point cloud tasks. However, point cloud objects are typically characterized by complex, disordered, and non-Euclidean spatial structures with multiple scales, and their behavior is often dynamic and unpredictable. The current self-attention modules mostly rely on dot product multiplication and dimension alignment among query-key-value features, which cannot adequately capture the multi-scale non-Euclidean structures of point cloud objects. To address these problems, this paper proposes a self-attention plug-in module with its variants, Multi-scale Geometry-aware Transformer (MGT). MGT processes point cloud data with multi-scale local and global geometric information in the following three aspects. At first, the MGT divides point cloud data into patches with multiple scales. Secondly, a local feature extractor based on sphere mapping is proposed to explore the geometry inner each patch and generate a fixed-length representation for each patch. Thirdly, the fixed-length representations are fed into a novel geodesic-based self-attention to capture the global non-Euclidean geometry between patches. Finally, all the modules are integrated into the framework of MGT with an end-to-end training scheme. Experimental results demonstrate that the MGT vastly increases the capability of capturing multi-scale geometry using the self-attention mechanism and achieves strong competitive performance on mainstream point cloud benchmarks.
翻译:自注意力模块在捕捉长程关系及提升点云任务性能方面展现了卓越能力。然而,点云对象通常具有复杂、无序且非欧几里得空间结构,并呈现多尺度特征,其行为往往动态且不可预测。现有自注意力模块大多依赖查询-键-值特征之间的点积乘法与维度对齐,难以充分捕捉点云对象的多尺度非欧几里得结构。为解决上述问题,本文提出一种自注意力插件模块及其变体——多尺度几何感知Transformer(MGT)。MGT通过以下三个方面处理带有局部与全局几何信息的点云数据:首先,将点云数据划分为多尺度块;其次,提出基于球面映射的局部特征提取器,以探索每个块内部的几何结构并生成固定长度表示;再次,将固定长度表示输入新型测地线自注意力机制,以捕捉块间的全局非欧几里得几何关系;最后,将所有模块集成至MGT框架中,并采用端到端训练方案。实验结果表明,MGT通过自注意力机制显著增强了多尺度几何捕捉能力,并在主流点云基准测试中取得了极具竞争力的性能表现。