In this paper, we address the problem of wide-baseline camera pose estimation from a group of 360$^\circ$ panoramas under upright-camera assumption. Recent work has demonstrated the merit of deep-learning for end-to-end direct relative pose regression in 360$^\circ$ panorama pairs [11]. To exploit the benefits of multi-view logic in a learning-based framework, we introduce Graph-CoVis, which non-trivially extends CoVisPose [11] from relative two-view to global multi-view spherical camera pose estimation. Graph-CoVis is a novel Graph Neural Network based architecture that jointly learns the co-visible structure and global motion in an end-to-end and fully-supervised approach. Using the ZInD [4] dataset, which features real homes presenting wide-baselines, occlusion, and limited visual overlap, we show that our model performs competitively to state-of-the-art approaches.
翻译:本文研究了在直立相机假设下,从一组360°全景图像中解决宽基线相机姿态估计的问题。近期研究已证明深度学习在全景图像对端到端直接相对姿态回归中的优势[11]。为在多视角学习框架中充分利用多视角逻辑,我们提出Graph-CoVis,该方法将CoVisPose[11]从相对双视角非平凡地扩展至全局多视角球形相机姿态估计。Graph-CoVis是一种基于图神经网络的新型架构,采用端到端全监督方式联合学习共视结构与全局运动。基于ZInD[4]数据集(该数据集包含真实住宅场景,呈现宽基线、遮挡和有限视觉重叠特征),我们证明所提模型在性能上与最先进方法具有竞争力。