We consider the visual disambiguation task of determining whether a pair of visually similar images depict the same or distinct 3D surfaces (e.g., the same or opposite sides of a symmetric building). Illusory image matches, where two images observe distinct but visually similar 3D surfaces, can be challenging for humans to differentiate, and can also lead 3D reconstruction algorithms to produce erroneous results. We propose a learning-based approach to visual disambiguation, formulating it as a binary classification task on image pairs. To that end, we introduce a new dataset for this problem, Doppelgangers, which includes image pairs of similar structures with ground truth labels. We also design a network architecture that takes the spatial distribution of local keypoints and matches as input, allowing for better reasoning about both local and global cues. Our evaluation shows that our method can distinguish illusory matches in difficult cases, and can be integrated into SfM pipelines to produce correct, disambiguated 3D reconstructions. See our project page for our code, datasets, and more results: http://doppelgangers-3d.github.io/.
翻译:我们研究视觉歧义消解任务——判定一对视觉相似的图像是否描绘同一或不同的三维表面(例如对称建筑的同侧或对侧)。当两幅图像观察不同但视觉相似的三维表面时产生的幻象匹配,不仅对人类区分构成挑战,还会导致三维重建算法产生错误结果。我们提出基于学习的视觉歧义消解方法,将其形式化为图像对的二分类任务。为此,我们引入该问题的新数据集Doppelgangers,包含相似结构图像对及其真实标注。同时设计以局部关键点及匹配的空间分布为输入的神经网络架构,实现局部与全局线索的更优推理。实验表明,该方法能有效区分困难案例中的幻象匹配,并可集成至运动恢复结构(SfM)流程以生成正确消歧的三维重建。代码、数据集及更多结果详见项目主页:http://doppelgangers-3d.github.io/。