Computer vision has long relied on two kinds of correspondences: pixel correspondences in images and 3D correspondences on object surfaces. Is there another kind, and if there is, what can they do for us? In this paper, we introduce correspondences of the third kind we call reflection correspondences and show that they can help estimate camera pose by just looking at objects without relying on the background. Reflection correspondences are point correspondences in the reflected world, i.e., the scene reflected by the object surface. The object geometry and reflectance alters the scene geometrically and radiometrically, respectively, causing incorrect pixel correspondences. Geometry recovered from each image is also hampered by distortions, namely generalized bas-relief ambiguity, leading to erroneous 3D correspondences. We show that reflection correspondences can resolve the ambiguities arising from these distortions. We introduce a neural correspondence estimator and a RANSAC algorithm that fully leverages all three kinds of correspondences for robust and accurate joint camera pose and object shape estimation just from the object appearance. The method expands the horizon of numerous downstream tasks, including camera pose estimation for appearance modeling (e.g., NeRF) and motion estimation of reflective objects (e.g., cars on the road), to name a few, as it relieves the requirement of overlapping background.
翻译:长期以来,计算机视觉依赖于两类对应:图像中的像素对应和物体表面上的三维对应。是否存在另一类对应?如果存在,它们能为我们做什么?本文引入我们称之为反射对应的第三类对应,并表明仅通过观察物体而无需依赖背景,即可利用这类对应估计相机姿态。反射对应是反射世界(即物体表面反射的场景)中的点对应。物体几何与反射率分别从几何与辐射角度改变场景,导致像素对应出错。同时,从每张图像恢复的几何结构也受变形影响(即广义低浮雕歧义),引发错误的三维对应。我们证明反射对应能解决这些变形带来的歧义。我们提出一种神经对应估计器及RANSAC算法,通过充分融合三类对应,仅凭物体外观即可实现鲁棒且精确的联合相机姿态与物体形状估计。该方法拓展了众多下游任务的应用范围,包括外观建模(如NeRF)中的相机姿态估计、反射物体(如道路行驶车辆)的运动估计等,因其不再需要重叠背景这一前提条件。