Robust estimation of the essential matrix, which encodes the relative position and orientation of two cameras, is a fundamental step in structure from motion pipelines. Recent deep-based methods achieved accurate estimation by using complex network architectures that involve graphs, attention layers, and hard pruning steps. Here, we propose a simpler network architecture based on Deep Sets. Given a collection of point matches extracted from two images, our method identifies outlier point matches and models the displacement noise in inlier matches. A weighted DLT module uses these predictions to regress the essential matrix. Our network achieves accurate recovery that is superior to existing networks with significantly more complex architectures.
翻译:本质矩阵编码了两台相机之间的相对位置与方位,其鲁棒估计是运动恢复结构流程中的基础步骤。现有的深度学习方法通过采用包含图结构、注意力层和硬剪枝步骤的复杂网络架构,实现了精确的估计。本文提出了一种基于Deep Sets的更为简洁的网络架构。给定从两幅图像中提取的点匹配集合,我们的方法能够识别异常点匹配,并对内点匹配中的位移噪声进行建模。加权直接线性变换模块利用这些预测结果回归本质矩阵。实验表明,我们的网络在实现精确恢复方面优于现有架构显著更复杂的网络。