Previous methods solve feature matching and pose estimation using a two-stage process by first finding matches and then estimating the pose. As they ignore the geometric relationships between the two tasks, they focus on either improving the quality of matches or filtering potential outliers, leading to limited efficiency or accuracy. In contrast, we propose an iterative matching and pose estimation framework (IMP) leveraging the geometric connections between the two tasks: a few good matches are enough for a roughly accurate pose estimation; a roughly accurate pose can be used to guide the matching by providing geometric constraints. To this end, we implement a geometry-aware recurrent attention-based module which jointly outputs sparse matches and camera poses. Specifically, for each iteration, we first implicitly embed geometric information into the module via a pose-consistency loss, allowing it to predict geometry-aware matches progressively. Second, we introduce an \textbf{e}fficient IMP, called EIMP, to dynamically discard keypoints without potential matches, avoiding redundant updating and significantly reducing the quadratic time complexity of attention computation in transformers. Experiments on YFCC100m, Scannet, and Aachen Day-Night datasets demonstrate that the proposed method outperforms previous approaches in terms of accuracy and efficiency.
翻译:以往方法采用先匹配后估计的两阶段流程处理特征匹配与位姿估计问题,由于忽略了两任务间的几何关联,此类方法要么着重提升匹配质量,要么专注于滤除潜在异常值,导致效率或精度受限。为此,我们提出迭代匹配与位姿估计框架(IMP),利用两任务间的几何纽带:少量优质匹配足以获得粗略的精确位姿估计,而粗略精确的位姿又可提供几何约束以指导匹配过程。基于此,我们设计了几何感知递归注意力模块,能够联合输出稀疏匹配点与相机位姿。具体而言,在每次迭代中,我们首先通过位姿一致性损失将几何信息隐式嵌入模块,使其能渐进预测几何感知匹配点;其次引入高效IMP(EIMP),动态丢弃无潜在匹配点的关键点,避免冗余更新并显著降低Transformer中注意力计算的二次时间复杂度。在YFCC100m、Scannet和Aachen Day-Night数据集上的实验表明,所提方法在准确率和效率上均优于现有方法。