Previous methods solve feature matching and pose estimation using a two-stage process by first finding matches and then estimating the pose. As they ignore the geometric relationships between the two tasks, they focus on either improving the quality of matches or filtering potential outliers, leading to limited efficiency or accuracy. In contrast, we propose an iterative matching and pose estimation framework (IMP) leveraging the geometric connections between the two tasks: a few good matches are enough for a roughly accurate pose estimation; a roughly accurate pose can be used to guide the matching by providing geometric constraints. To this end, we implement a geometry-aware recurrent attention-based module which jointly outputs sparse matches and camera poses. Specifically, for each iteration, we first implicitly embed geometric information into the module via a pose-consistency loss, allowing it to predict geometry-aware matches progressively. Second, we introduce an \textbf{e}fficient IMP, called EIMP, to dynamically discard keypoints without potential matches, avoiding redundant updating and significantly reducing the quadratic time complexity of attention computation in transformers. Experiments on YFCC100m, Scannet, and Aachen Day-Night datasets demonstrate that the proposed method outperforms previous approaches in terms of accuracy and efficiency.
翻译:先前方法采用两阶段流程解决特征匹配与位姿估计问题:首先进行特征匹配,再估算相机位姿。由于忽略了两任务间的几何关联,此类方法或侧重于提升匹配质量,或致力于过滤潜在外点,导致效率或精度受限。为此,我们提出迭代式匹配与位姿估计框架(IMP),利用两任务间的几何联系——少量优质匹配即足以实现大致准确的位姿估计,而大致准确的位姿可通过提供几何约束来引导匹配。基于此,我们设计了一种几何感知的循环注意力模块,可联合输出稀疏匹配点与相机位姿。具体而言:每次迭代中,首先通过位姿一致性损失将几何信息隐式嵌入模块,使其逐步预测几何感知的匹配;其次,引入高效IMP(EIMP)以动态剔除无潜在匹配的关键点,避免冗余更新,并显著降低Transformer注意力计算的二次时间复杂度。在YFCC100m、Scannet及Aachen Day-Night数据集上的实验表明,所提方法在精度与效率上均优于先前方法。