Camera pose estimation for two-view geometry traditionally relies on RANSAC. Normally, a multitude of image correspondences leads to a pool of proposed hypotheses, which are then scored to find a winning model. The inlier count is generally regarded as a reliable indicator of "consensus". We examine this scoring heuristic, and find that it favors disappointing models under certain circumstances. As a remedy, we propose the Fundamental Scoring Network (FSNet), which infers a score for a pair of overlapping images and any proposed fundamental matrix. It does not rely on sparse correspondences, but rather embodies a two-view geometry model through an epipolar attention mechanism that predicts the pose error of the two images. FSNet can be incorporated into traditional RANSAC loops. We evaluate FSNet on fundamental and essential matrix estimation on indoor and outdoor datasets, and establish that FSNet can successfully identify good poses for pairs of images with few or unreliable correspondences. Besides, we show that naively combining FSNet with MAGSAC++ scoring approach achieves state of the art results.
翻译:双视图几何的相机姿态估计传统上依赖于RANSAC。通常,大量图像对应点会生成一个假设池,然后通过评分选出最佳模型。内点数量通常被视为"共识"的可靠指标。我们研究了这一评分启发式方法,发现在某些情况下它会偏好令人失望的模型。为此,我们提出了基础评分网络(FSNet),该网络能为任意重叠图像对及其基础矩阵推断出分数。它不依赖稀疏对应点,而是通过极线注意力机制体现双视图几何模型,从而预测两幅图像的姿态误差。FSNet可被集成到传统RANSAC循环中。我们在室内和室外数据集上评估了FSNet在基础矩阵和本质矩阵估计上的表现,结果表明FSNet能够成功识别出对应点稀少或不可靠的图像对的优良姿态。此外,我们证明将FSNet与MAGSAC++评分方法简单结合即可达到最先进的结果。