Multiview Structure from Motion is a fundamental and challenging computer vision problem. A recent deep-based approach was proposed utilizing matrix equivariant architectures for the simultaneous recovery of camera pose and 3D scene structure from large image collections. This work however made the unrealistic assumption that the point tracks given as input are clean of outliers. Here we propose an architecture suited to dealing with outliers by adding an inlier/outlier classifying module that respects the model equivariance and by adding a robust bundle adjustment step. Experiments demonstrate that our method can be successfully applied in realistic settings that include large image collections and point tracks extracted with common heuristics and include many outliers.
翻译:摘要:多视图运动恢复结构是计算机视觉中一个基础且具有挑战性的问题。近期一种基于深度的方法被提出,它利用矩阵等变架构从大规模图像集合中同时恢复相机姿态和三维场景结构。然而,该方法做出了不切实际的假设,即输入的点轨迹不含异常值。本文提出了一种适用于处理异常值的架构,通过添加一个尊重模型等变性的内点/外点分类模块,并增加一个鲁棒的集束调整步骤。实验表明,我们的方法能够成功应用于包含大规模图像集合以及通过通用启发式方法提取的、含有大量异常值的点轨迹的真实场景中。