Establishing the correct correspondence of feature points is a fundamental task in computer vision. However, the presence of numerous outliers among the feature points can significantly affect the matching results, reducing the accuracy and robustness of the process. Furthermore, a challenge arises when dealing with a large proportion of outliers: how to ensure the extraction of high-quality information while reducing errors caused by negative samples. To address these issues, in this paper, we propose a novel method called Layer-by-Layer Hierarchical Attention Network, which enhances the precision of feature point matching in computer vision by addressing the issue of outliers. Our method incorporates stage fusion, hierarchical extraction, and an attention mechanism to improve the network's representation capability by emphasizing the rich semantic information of feature points. Specifically, we introduce a layer-by-layer channel fusion module, which preserves the feature semantic information from each stage and achieves overall fusion, thereby enhancing the representation capability of the feature points. Additionally, we design a hierarchical attention module that adaptively captures and fuses global perception and structural semantic information using an attention mechanism. Finally, we propose two architectures to extract and integrate features, thereby improving the adaptability of our network. We conduct experiments on two public datasets, namely YFCC100M and SUN3D, and the results demonstrate that our proposed method outperforms several state-of-the-art techniques in both outlier removal and camera pose estimation. Source code is available at http://www.linshuyuan.com.
翻译:在计算机视觉中,建立特征点的正确对应关系是一项基础任务。然而,特征点中存在的大量异常值会显著影响匹配结果,降低过程的准确性和鲁棒性。此外,当处理高比例异常值时,一个挑战随之而来:如何在减少负样本引起的误差的同时,确保高质量信息的提取。为解决这些问题,本文提出了一种名为逐层层次化注意力网络的新方法,该方法通过处理异常值问题,提升了计算机视觉中特征点匹配的精度。我们的方法融合了阶段融合、层次化提取和注意力机制,通过强调特征点丰富的语义信息来增强网络的表示能力。具体而言,我们引入了一个逐层通道融合模块,该模块保留了每个阶段的特征语义信息并实现整体融合,从而增强了特征点的表示能力。此外,我们设计了一个层次化注意力模块,该模块利用注意力机制自适应地捕获并融合全局感知和结构语义信息。最后,我们提出了两种架构来提取和整合特征,从而提升了我们网络的适应性。我们在两个公开数据集(即YFCC100M和SUN3D)上进行了实验,结果表明,我们提出的方法在异常值去除和相机姿态估计两方面均优于多种先进技术。源代码可在 http://www.linshuyuan.com 获取。