This work introduces an effective and practical solution to the dense two-view structure from motion (SfM) problem. One vital question addressed is how to mindfully use per-pixel optical flow correspondence between two frames for accurate pose estimation -- as perfect per-pixel correspondence between two images is difficult, if not impossible, to establish. With the carefully estimated camera pose and predicted per-pixel optical flow correspondences, a dense depth of the scene is computed. Later, an iterative refinement procedure is introduced to further improve optical flow matching confidence, camera pose, and depth, exploiting their inherent dependency in rigid SfM. The fundamental idea presented is to benefit from per-pixel uncertainty in the optical flow estimation and provide robustness to the dense SfM system via an online refinement. Concretely, we introduce a pipeline consisting of (i) an uncertainty-aware dense optical flow estimation approach that provides per-pixel correspondence with their confidence score of matching; (ii) a weighted dense bundle adjustment formulation that depends on optical flow uncertainty and bidirectional optical flow consistency to refine both pose and depth; (iii) a depth estimation network that considers its consistency with the estimated poses and optical flow respecting epipolar constraint. Extensive experiments show that the proposed approach achieves remarkable depth accuracy and state-of-the-art camera pose results superseding SuperPoint and SuperGlue accuracy when tested on benchmark datasets such as DeMoN, YFCC100M, and ScanNet.
翻译:本文提出了一种有效且实用的解决密集双视角运动恢复结构(SfM)问题的方法。其中一个关键问题是如何谨慎地利用两帧图像间的逐像素光流对应关系来实现精确的姿态估计——因为在两幅图像间建立完美的逐像素对应关系即使并非不可能,也是十分困难的。借助精心估计的相机姿态和预测的逐像素光流对应关系,可以计算出场景的密集深度。随后,引入了一个迭代优化流程,进一步改善光流匹配置信度、相机姿态和深度,并利用它们在刚性SfM中的固有依赖关系。本文提出的核心思想是利用光流估计中的逐像素不确定性,通过在线优化为密集SfM系统提供鲁棒性。具体而言,我们引入的流程包括:(i) 一种感知不确定性的密集光流估计方法,提供逐像素对应及其匹配置信度分数;(ii) 一种基于光流不确定性和双向光流一致性的加权密集光束法平差公式,用于同时优化姿态和深度;(iii) 一种深度估计网络,考虑其与估计姿态及遵守对极约束的光流之间的一致性。大量实验表明,在DeMoN、YFCC100M和ScanNet等基准数据集上测试时,所提方法实现了卓越的深度精度和领先的相机姿态结果,其精度超越了SuperPoint和SuperGlue。