Keypoint detection is the foundation of many computer vision tasks, including image registration, structure-from-motion, 3D reconstruction, visual odometry, and SLAM. Traditional detectors (SIFT, ORB, BRISK, FAST, etc.) and learning-based methods (SuperPoint, R2D2, QuadNet, LIFT, etc.) have shown strong performance gains yet suffer from key limitations: sensitivity to photometric changes, low keypoint density and repeatability, limited adaptability to challenging scenes, and lack of semantic understanding, often failing to prioritize visually important regions. We present DeepDetect, an intelligent, all-in-one, dense detector that unifies the strengths of classical detectors using deep learning. Firstly, we create ground-truth masks by fusing outputs of 7 keypoint and 2 edge detectors, extracting diverse visual cues from corners and blobs to prominent edges and textures in the images. Afterwards, a lightweight and efficient model: ESPNet, is trained using fused masks as labels, enabling DeepDetect to focus semantically on images while producing highly dense keypoints, that are adaptable to diverse and visually degraded conditions. Evaluations on Oxford, HPatches, and Middlebury datasets demonstrate that DeepDetect surpasses other detectors achieving maximum values of 0.5143 (average keypoint density), 0.9582 (average repeatability), 338,118 (correct matches), and 842,045 (voxels in stereo 3D reconstruction).
翻译:关键点检测是许多计算机视觉任务的基础,包括图像配准、运动恢复结构、三维重建、视觉里程计和同步定位与地图构建。传统检测器(SIFT、ORB、BRISK、FAST等)和基于学习的方法(SuperPoint、R2D2、QuadNet、LIFT等)表现出强劲的性能提升,但存在关键局限:对光度变化的敏感性、低关键点密度和重复性、对挑战性场景的适应性有限,以及缺乏语义理解,常难以优先关注视觉重要区域。我们提出DeepDetect,一种智能、全集成、稠密检测器,通过深度学习统一了经典检测器的优势。首先,通过融合7个关键点检测器和2个边缘检测器的输出创建真值掩码,从图像中的角点和斑点提取多样的视觉线索,直至显著边缘和纹理。随后,使用融合掩码作为标签训练轻量高效模型ESPNet,使DeepDetect能够从语义上聚焦图像,同时生成高度稠密的关键点,这些关键点适用于多样化和视觉退化的场景条件。在Oxford、HPatches和Middlebury数据集上的评估表明,DeepDetect超越了其他检测器,达到了最大值:0.5143(平均关键点密度)、0.9582(平均重复性)、338,118(正确匹配对)以及842,045(立体三维重建中的体素)。