Recent advances in monocular depth estimation methods (MDE) and their improved accuracy open new possibilities for their applications. In this paper, we investigate how monocular depth estimates can be used for relative pose estimation. In particular, we are interested in answering the question whether using MDEs improves results over traditional point-based methods. We propose a novel framework for estimating the relative pose of two cameras from point correspondences with associated monocular depths. Since depth predictions are typically defined up to an unknown scale or even both unknown scale and shift parameters, our solvers jointly estimate the scale or both the scale and shift parameters along with the relative pose. We derive efficient solvers considering different types of depths for three camera configurations: (1) two calibrated cameras, (2) two cameras with an unknown shared focal length, and (3) two cameras with unknown different focal lengths. Our new solvers outperform state-of-the-art depth-aware solvers in terms of speed and accuracy. In extensive real experiments on multiple datasets and with various MDEs, we discuss which depth-aware solvers are preferable in which situation. The code will be made publicly available.
翻译:近年来,单目深度估计方法(MDE)的进展及其精度的提升为其应用开辟了新的可能性。本文研究了如何利用单目深度估计进行相对位姿估计。我们特别关注于回答以下问题:相较于传统的基于点的方法,使用MDE是否能够改善结果。我们提出了一种新颖的框架,用于从带有相关单目深度信息的点对应中估计两个相机之间的相对位姿。由于深度预测通常定义在一个未知尺度下,甚至同时存在未知尺度和偏移参数,我们的求解器能够联合估计尺度(或尺度与偏移参数)以及相对位姿。针对三种相机配置:(1)两个已标定相机,(2)具有未知共享焦距的两个相机,以及(3)具有未知不同焦距的两个相机,我们推导了考虑不同类型深度信息的高效求解器。我们新的求解器在速度和精度方面均优于当前最先进的深度感知求解器。通过在多个数据集上使用各种MDE进行的大量真实实验,我们讨论了在何种情况下应优先选择哪种深度感知求解器。代码将公开提供。