We present a technique and benchmark dataset for estimating the relative 3D orientation between a pair of Internet images captured in an extreme setting, where the images have limited or non-overlapping field of views. Prior work targeting extreme rotation estimation assume constrained 3D environments and emulate perspective images by cropping regions from panoramic views. However, real images captured in the wild are highly diverse, exhibiting variation in both appearance and camera intrinsics. In this work, we propose a Transformer-based method for estimating relative rotations in extreme real-world settings, and contribute the ExtremeLandmarkPairs dataset, assembled from scene-level Internet photo collections. Our evaluation demonstrates that our approach succeeds in estimating the relative rotations in a wide variety of extremeview Internet image pairs, outperforming various baselines, including dedicated rotation estimation techniques and contemporary 3D reconstruction methods.
翻译:本文提出了一种技术及基准数据集,用于估计在极端拍摄条件下获取的互联网图像对之间的相对三维方向,这些图像具有有限或非重叠的视场。先前针对极端旋转估计的研究假设了受限的三维环境,并通过从全景视图中裁剪区域来模拟透视图像。然而,真实场景中捕获的图像具有高度多样性,在表观特征和相机内参方面均存在显著差异。本工作提出了一种基于Transformer的方法,用于在极端真实世界条件下估计相对旋转,并贡献了从场景级互联网照片集合中构建的ExtremeLandmarkPairs数据集。评估结果表明,我们的方法能够成功估计多种极端视角互联网图像对的相对旋转,其性能优于多种基线方法,包括专用的旋转估计技术和当代三维重建方法。