Learning feature correspondence is a foundational task in computer vision, holding immense importance for downstream applications such as visual odometry and 3D reconstruction. Despite recent progress in data-driven models, feature correspondence learning is still limited by the lack of accurate per-pixel correspondence labels. To overcome this difficulty, we introduce a new self-supervised scheme, imperative learning (IL), for training feature correspondence. It enables correspondence learning on arbitrary uninterrupted videos without any camera pose or depth labels, heralding a new era for self-supervised correspondence learning. Specifically, we formulated the problem of correspondence learning as a bilevel optimization, which takes the reprojection error from bundle adjustment as a supervisory signal for the model. To avoid large memory and computation overhead, we leverage the stationary point to effectively back-propagate the implicit gradients through bundle adjustment. Through extensive experiments, we demonstrate superior performance on tasks including feature matching and pose estimation, in which we obtained an average of 30% accuracy gain over the state-of-the-art matching models. This preprint corresponds to the Accepted Manuscript in European Conference on Computer Vision (ECCV) 2024.
翻译:特征对应学习是计算机视觉中的一项基础任务,对于视觉里程计和三维重建等下游应用具有极其重要的意义。尽管数据驱动模型近期取得了进展,但特征对应学习仍因缺乏精确的逐像素对应标签而受到限制。为克服这一困难,我们引入了一种新的自监督方案——命令式学习(IL),用于训练特征对应。该方案能够在任意连续视频上进行对应学习,无需任何相机位姿或深度标签,为自监督对应学习开启了新纪元。具体而言,我们将对应学习问题表述为一个双层优化问题,该问题将来自光束法平差的重投影误差作为模型的监督信号。为避免巨大的内存和计算开销,我们利用驻点通过光束法平差有效反向传播隐式梯度。通过大量实验,我们在特征匹配和姿态估计等任务上展示了卓越的性能,其中我们相比最先进的匹配模型平均获得了30%的准确率提升。本预印本对应欧洲计算机视觉会议(ECCV)2024的录用稿件。