Feature matching is a cornerstone task in computer vision, essential for applications such as image retrieval, stereo matching, 3D reconstruction, and SLAM. This survey comprehensively reviews modality-based feature matching, exploring traditional handcrafted methods and emphasizing contemporary deep learning approaches across various modalities, including RGB images, depth images, 3D point clouds, LiDAR scans, medical images, and vision-language interactions. Traditional methods, leveraging detectors like Harris corners and descriptors such as SIFT and ORB, demonstrate robustness under moderate intra-modality variations but struggle with significant modality gaps. Contemporary deep learning-based methods, exemplified by detector-free strategies like CNN-based SuperPoint and transformer-based LoFTR, substantially improve robustness and adaptability across modalities. We highlight modality-aware advancements, such as geometric and depth-specific descriptors for depth images, sparse and dense learning methods for 3D point clouds, attention-enhanced neural networks for LiDAR scans, and specialized solutions like the MIND descriptor for complex medical image matching. Cross-modal applications, particularly in medical image registration and vision-language tasks, underscore the evolution of feature matching to handle increasingly diverse data interactions.
翻译:特征匹配是计算机视觉中的基础任务,对图像检索、立体匹配、三维重建及SLAM等应用至关重要。本综述全面梳理了基于模态的特征匹配方法,涵盖传统手工设计方法并重点探讨各类模态下的当代深度学习方法,包括RGB图像、深度图像、三维点云、LiDAR扫描数据、医学图像及视觉-语言交互。传统方法利用Harris角点检测器与SIFT、ORB等描述符,在模态内适度变化下表现出鲁棒性,但面对显著的模态差异时性能受限。当代深度学习方法,如基于CNN的SuperPoint与基于Transformer的LoFR等无检测器策略,显著提升了跨模态鲁棒性与适应性。本文重点介绍了模态感知的先进技术:针对深度图像的几何与深度特异性描述符、三维点云的稀疏与密集学习方法、LiDAR扫描数据的注意力增强神经网络,以及面向复杂医学图像匹配的MIND描述符等专属方案。跨模态应用(特别是医学图像配准与视觉-语言任务)凸显了特征匹配在处理日益多样化数据交互中的演进趋势。