Although point cloud registration has achieved remarkable advances in object-level and indoor scenes, large-scale registration methods are rarely explored. Challenges mainly arise from the huge point number, complex distribution, and outliers of outdoor LiDAR scans. In addition, most existing registration works generally adopt a two-stage paradigm: They first find correspondences by extracting discriminative local features, and then leverage estimators (eg. RANSAC) to filter outliers, which are highly dependent on well-designed descriptors and post-processing choices. To address these problems, we propose an end-to-end transformer network (RegFormer) for large-scale point cloud alignment without any further post-processing. Specifically, a projection-aware hierarchical transformer is proposed to capture long-range dependencies and filter outliers by extracting point features globally. Our transformer has linear complexity, which guarantees high efficiency even for large-scale scenes. Furthermore, to effectively reduce mismatches, a bijective association transformer is designed for regressing the initial transformation. Extensive experiments on KITTI and NuScenes datasets demonstrate that our RegFormer achieves competitive performance in terms of both accuracy and efficiency.
翻译:尽管点云配准在物体级和室内场景中取得了显著进展,但大规模配准方法仍鲜有探索。主要挑战源于户外LiDAR扫描的巨大点数量、复杂分布以及离群点。此外,现有大多数配准工作通常采用两阶段范式:首先通过提取判别性局部特征建立对应关系,再利用估计器(如RANSAC)过滤离群点,这种方法高度依赖于精心设计的描述符和后处理选择。为解决这些问题,我们提出了一种端到端的Transformer网络(RegFormer)用于大规模点云对齐,无需任何额外后处理。具体而言,我们提出一种投影感知分层Transformer,通过全局提取点特征来捕获长程依赖关系并过滤离群点。该Transformer具有线性复杂度,即使对大规模场景也能保证高效率。此外,为有效减少误匹配,我们设计了一种双射关联Transformer用于回归初始变换。在KITTI和NuScenes数据集上的大量实验表明,我们的RegFormer在精度和效率方面均取得了有竞争力的性能。