Learning based feature matching methods have been commonly studied in recent years. The core issue for learning feature matching is to how to learn (1) discriminative representations for feature points (or regions) within each intra-image and (2) consensus representations for feature points across inter-images. Recently, self- and cross-attention models have been exploited to address this issue. However, in many scenes, features are coming with large-scale, redundant and outliers contaminated. Previous self-/cross-attention models generally conduct message passing on all primal features which thus lead to redundant learning and high computational cost. To mitigate limitations, inspired by recent seed matching methods, in this paper, we propose a novel efficient Anchor Matching Transformer (AMatFormer) for the feature matching problem. AMatFormer has two main aspects: First, it mainly conducts self-/cross-attention on some anchor features and leverages these anchor features as message bottleneck to learn the representations for all primal features. Thus, it can be implemented efficiently and compactly. Second, AMatFormer adopts a shared FFN module to further embed the features of two images into the common domain and thus learn the consensus feature representations for the matching problem. Experiments on several benchmarks demonstrate the effectiveness and efficiency of the proposed AMatFormer matching approach.
翻译:近年来,基于学习的特征匹配方法已被广泛研究。特征匹配的核心问题在于如何学习(1)单幅图像内特征点(或区域)的判别性表征,以及(2)跨图像特征点间的共识性表征。近期,自注意力与交叉注意力模型被用于解决该问题。然而,在许多场景中,特征数据存在规模庞大、冗余且受离群值污染的问题。现有的自注意力/交叉注意力模型通常对所有原始特征进行消息传递,导致冗余学习与高计算成本。为解决上述局限性,受近期种子匹配方法的启发,本文提出了一种新颖的高效锚点匹配Transformer(AMatFormer)用于特征匹配问题。AMatFormer包含两个核心设计:首先,它主要对部分锚点特征执行自注意力/交叉注意力操作,并通过将这些锚点特征作为消息瓶颈来学习所有原始特征的表征。因此,该方法能够实现高效且紧凑的计算。其次,AMatFormer采用共享前馈网络模块,进一步将两幅图像的特征嵌入到公共域中,从而学习匹配问题所需的共识性特征表征。在多个基准数据集上的实验证明了所提出的AMatFormer匹配方法的有效性和高效性。