Keypoint detection and matching is a fundamental task in many computer vision problems, from shape reconstruction, to structure from motion, to AR/VR applications and robotics. It is a well-studied problem with remarkable successes such as SIFT, and more recent deep learning approaches. While great robustness is exhibited by these techniques with respect to noise, illumination variation, and rigid motion transformations, less attention has been placed on image distortion sensitivity. In this work, we focus on the case when this is caused by the geometry of the cameras used for image acquisition, and consider the keypoint detection and matching problem between the hybrid scenario of a fisheye and a projective image. We build on a state-of-the-art approach and derive a self-supervised procedure that enables training an interest point detector and descriptor network. We also collected two new datasets for additional training and testing in this unexplored scenario, and we demonstrate that current approaches are suboptimal because they are designed to work in traditional projective conditions, while the proposed approach turns out to be the most effective.
翻译:关键点检测与匹配是计算机视觉中的基础任务,广泛应用于形状重建、运动恢复结构、增强现实/虚拟现实以及机器人等领域。该问题已得到充分研究,并涌现出诸如SIFT等经典方法及近年来的深度学习方法。尽管现有技术对噪声、光照变化和刚体运动变换具有出色的鲁棒性,但对图像畸变敏感性的关注相对不足。本文聚焦于由相机几何特性引起的畸变场景,探究鱼眼图像与投影图像混合情况下的关键点检测与匹配问题。我们基于前沿方法提出一种自监督框架,可实现兴趣点检测器与描述符网络的训练。针对这一未充分探索的场景,我们构建了两个新数据集用于额外训练与测试。实验表明,现有方法因专为传统投影场景设计而表现欠佳,而所提方法展现出最优性能。