The extraction of keypoints in images is at the basis of many computer vision applications, from localization to 3D reconstruction. Keypoints come with a score permitting to rank them according to their quality. While learned keypoints often exhibit better properties than handcrafted ones, their scores are not easily interpretable, making it virtually impossible to compare the quality of individual keypoints across methods. We propose a framework that can refine, and at the same time characterize with an interpretable score, the keypoints extracted by any method. Our approach leverages a modified robust Gaussian Mixture Model fit designed to both reject non-robust keypoints and refine the remaining ones. Our score comprises two components: one relates to the probability of extracting the same keypoint in an image captured from another viewpoint, the other relates to the localization accuracy of the keypoint. These two interpretable components permit a comparison of individual keypoints extracted across different methods. Through extensive experiments we demonstrate that, when applied to popular keypoint detectors, our framework consistently improves the repeatability of keypoints as well as their performance in homography and two/multiple-view pose recovery tasks.
翻译:图像关键点提取是许多计算机视觉应用的基础,从定位到三维重建均依赖于此。关键点通常附带评分,可根据其质量进行排序。虽然学习得到的关键点往往比手工设计的具有更优特性,但其评分缺乏可解释性,导致难以跨方法比较单个关键点的质量。我们提出一个框架,能够对任何方法提取的关键点进行精化,并同时通过可解释的评分进行表征。该方法采用改进的鲁棒混合高斯模型拟合,既能剔除非鲁棒关键点,又能精化剩余关键点。我们的评分包含两个组成部分:其一关联于从另一视角捕获的图像中提取相同关键点的概率,其二关联于关键点的定位精度。这两个可解释的组成部分支持跨不同方法提取的单个关键点的比较。通过大量实验证明,当应用于主流关键点检测器时,我们的框架能持续提升关键点的可重复性,并显著改善其在单应性估计及双视角/多视角位姿恢复任务中的性能。