Self-supervised multi-frame depth estimation achieves high accuracy by computing matching costs of pixel correspondences between adjacent frames, injecting geometric information into the network. These pixel-correspondence candidates are computed based on the relative pose estimates between the frames. Accurate pose predictions are essential for precise matching cost computation as they influence the epipolar geometry. Furthermore, improved depth estimates can, in turn, be used to align pose estimates. Inspired by traditional structure-from-motion (SfM) principles, we propose the DualRefine model, which tightly couples depth and pose estimation through a feedback loop. Our novel update pipeline uses a deep equilibrium model framework to iteratively refine depth estimates and a hidden state of feature maps by computing local matching costs based on epipolar geometry. Importantly, we used the refined depth estimates and feature maps to compute pose updates at each step. This update in the pose estimates slowly alters the epipolar geometry during the refinement process. Experimental results on the KITTI dataset demonstrate competitive depth prediction and odometry prediction performance surpassing published self-supervised baselines.
翻译:自监督多帧深度估计通过计算相邻帧间像素对应的匹配代价,将几何信息注入网络,从而获得高精度。这些像素对应候选基于帧间相对位姿估计计算得到,精确的位姿预测对于匹配代价的准确计算至关重要,因为它直接影响极线几何。进一步地,改进后的深度估计又可反过来用于对齐位姿估计。受传统运动恢复结构(SfM)原理启发,我们提出DualRefine模型,通过反馈回路紧密耦合深度与位姿估计。新颖的更新流程采用深度平衡模型框架,通过基于极线几何计算局部匹配代价,迭代精化深度估计与特征图隐藏状态。关键之处在于,我们在每一步骤中使用精化后的深度估计与特征图来计算位姿更新,这种位姿估计的更新会在精化过程中逐步改变极线几何。在KITTI数据集上的实验结果表明,本方法在深度预测与里程计预测性能上均超越了已发表的自监督基线模型。