Keypoint detection is the foundation of many computer vision tasks, including image registration, structure-from motion, 3D reconstruction, visual odometry, and SLAM. Traditional detectors (SIFT, SURF, ORB, BRISK, etc.) and learning based methods (SuperPoint, R2D2, LF-Net, D2-Net, etc.) have shown strong performance yet suffer from key limitations: sensitivity to photometric changes, low keypoint density and repeatability, limited adaptability to challenging scenes, and lack of semantic understanding, often failing to prioritize visually important regions. We present DeepDetect, an intelligent, all-in-one, dense keypoint detector that unifies the strengths of classical detectors using deep learning. Firstly, we create ground-truth masks by fusing outputs of 7 keypoint and 2 edge detectors, extracting diverse visual cues from corners and blobs to prominent edges and textures in the images. Afterwards, a lightweight and efficient model: ESPNet, is trained using these masks as labels, enabling DeepDetect to focus semantically on images while producing highly dense keypoints, that are adaptable to diverse and visually degraded conditions. Evaluations on the Oxford Affine Covariant Regions dataset demonstrate that DeepDetect surpasses other detectors in keypoint density, repeatability, and the number of correct matches, achieving maximum values of 0.5143 (average keypoint density), 0.9582 (average repeatability), and 59,003 (correct matches).
翻译:关键点检测是许多计算机视觉任务的基础,包括图像配准、运动恢复结构、三维重建、视觉里程计和SLAM。传统检测器(SIFT、SURF、ORB、BRISK等)与基于学习的方法(SuperPoint、R2D2、LF-Net、D2-Net等)虽表现出强大性能,但仍存在关键局限:对光度变化敏感、关键点密度与可重复性低、对挑战性场景适应能力有限,且缺乏语义理解,往往无法优先关注视觉上重要的区域。我们提出DeepDetect,一种智能、一体化、密集的关键点检测器,它通过深度学习统一了经典检测器的优势。首先,我们融合7种关键点检测器和2种边缘检测器的输出以创建真实掩码,从图像中的角点、斑点提取到显著边缘和纹理的多样化视觉线索。随后,使用这些掩码作为标签训练一个轻量高效的模型:ESPNet,使DeepDetect能够语义聚焦于图像,同时生成高度密集的关键点,并能适应多样化和视觉退化条件。在牛津仿射协变区域数据集上的评估表明,DeepDetect在关键点密度、可重复性和正确匹配数量上均超越其他检测器,分别达到最大值0.5143(平均关键点密度)、0.9582(平均可重复性)和59,003(正确匹配数)。