Attention-based graph neural networks have made great progress in feature matching learning. However, insight of how attention mechanism works for feature matching is lacked in the literature. In this paper, we rethink cross- and self-attention from the viewpoint of traditional feature matching and filtering. In order to facilitate the learning of matching and filtering, we inject the similarity of descriptors and relative positions into cross- and self-attention score, respectively. In this way, the attention can focus on learning residual matching and filtering functions with reference to the basic functions of measuring visual and spatial correlation. Moreover, we mine intra- and inter-neighbors according to the similarity of descriptors and relative positions. Then sparse attention for each point can be performed only within its neighborhoods to acquire higher computation efficiency. Feature matching networks equipped with our full and sparse residual attention learning strategies are termed ResMatch and sResMatch respectively. Extensive experiments, including feature matching, pose estimation and visual localization, confirm the superiority of our networks.
翻译:基于注意力的图神经网络在特征匹配学习中取得了显著进展。然而,现有文献缺乏对注意力机制在特征匹配中作用机理的深入理解。本文从传统特征匹配与滤波的角度重新审视交叉注意力和自注意力。为促进匹配与滤波的学习,我们分别将描述符相似性和相对位置信息注入到交叉注意力和自注意力得分中。通过这种方式,注意力能够聚焦于学习残差匹配与滤波函数,并以测量视觉与空间相关性的基本函数作为参照。此外,我们根据描述符相似性和相对位置挖掘内部与外部邻域,从而仅在各邻域内对每个点执行稀疏注意力,以获得更高的计算效率。采用完整和稀疏残差注意力学习策略的特征匹配网络分别称为ResMatch和sResMatch。大量实验,包括特征匹配、位姿估计和视觉定位,验证了我们网络的优越性。