Learning feature correspondence is a foundational task in computer vision, holding immense importance for downstream applications such as visual odometry and 3D reconstruction. Despite recent progress in data-driven models, feature correspondence learning is still limited by the lack of accurate per-pixel correspondence labels. To overcome this difficulty, we introduce a new self-supervised scheme, imperative learning (IL), for training feature correspondence. It enables correspondence learning on arbitrary uninterrupted videos without any camera pose or depth labels, heralding a new era for self-supervised correspondence learning. Specifically, we formulated the problem of correspondence learning as a bilevel optimization, which takes the reprojection error from bundle adjustment as a supervisory signal for the model. To avoid large memory and computation overhead, we leverage the stationary point to effectively back-propagate the implicit gradients through bundle adjustment. Through extensive experiments, we demonstrate superior performance on tasks including feature matching and pose estimation, in which we obtained an average of 30% accuracy gain over the state-of-the-art matching models.
翻译:学习特征对应是计算机视觉中的基础任务,对视觉里程计、三维重建等下游应用至关重要。尽管数据驱动模型近期取得了进展,但特征对应学习仍受限于缺乏精准的逐像素对应标签。为解决这一难题,我们提出了一种新的自监督方案——隐式学习(IL)来训练特征对应。该方法无需任何相机位姿或深度标签,即可在任意连续视频上进行对应学习,开创了自监督对应学习的新纪元。具体而言,我们将对应学习问题建模为双层优化,利用集束调整产生的重投影误差作为模型的监督信号。为避免庞大的内存和计算开销,我们通过驻点有效反向传播经集束调整的隐式梯度。大量实验表明,该方法在特征匹配和位姿估计等任务中表现优异,相较于最先进的匹配模型,平均准确率提升了30%。