NOPE-SAC: Neural One-Plane RANSAC for Sparse-View Planar 3D Reconstruction

This paper studies the challenging two-view 3D reconstruction in a rigorous sparse-view configuration, which is suffering from insufficient correspondences in the input image pairs for camera pose estimation. We present a novel Neural One-PlanE RANSAC framework (termed NOPE-SAC in short) that exerts excellent capability to learn one-plane pose hypotheses from 3D plane correspondences. Building on the top of a siamese plane detection network, our NOPE-SAC first generates putative plane correspondences with a coarse initial pose. It then feeds the learned 3D plane parameters of correspondences into shared MLPs to estimate the one-plane camera pose hypotheses, which are subsequently reweighed in a RANSAC manner to obtain the final camera pose. Because the neural one-plane pose minimizes the number of plane correspondences for adaptive pose hypotheses generation, it enables stable pose voting and reliable pose refinement in a few plane correspondences for the sparse-view inputs. In the experiments, we demonstrate that our NOPE-SAC significantly improves the camera pose estimation for the two-view inputs with severe viewpoint changes, setting several new state-of-the-art performances on two challenging benchmarks, i.e., MatterPort3D and ScanNet, for sparse-view 3D reconstruction. The source code is released at https://github.com/IceTTTb/NopeSAC for reproducible research.

翻译：本文研究了在严格稀疏视角配置下具有挑战性的两视图三维重建问题，该问题中因输入图像对缺乏足够对应点而难以进行相机位姿估计。我们提出了一种新颖的神经单平面RANSAC框架（简称NOPE-SAC），该框架具备从三维平面对应关系学习单平面位姿假设的卓越能力。基于孪生平面检测网络架构，NOPE-SAC首先利用粗略初始位姿生成候选平面对应关系，随后将学习到的对应关系三维平面参数输入共享多层感知机以估计单平面相机位姿假设，最后通过RANSAC方式对这些假设进行加权重估以获得最终相机位姿。由于神经单平面位姿能通过最小化平面对应数量实现自适应位姿假设生成，该方法在稀疏视角输入下仅需少量平面对应即可实现稳定的位姿投票与可靠的位姿优化。实验表明，我们的NOPE-SAC显著提升了存在剧烈视角变化的两视图输入的相机位姿估计精度，在MatterPort3D和ScanNet两个具有挑战性的基准测试中，针对稀疏视角三维重建任务创造了多项新的最优性能记录。源代码已发布于https://github.com/IceTTTb/NopeSAC以保障研究的可复现性。

相关内容

三维重建

关注 1174

在计算机视觉中, 三维重建是指根据单视图或者多视图的图像重建三维信息的过程. 由于单视频的信息不完全,因此三维重建需要利用经验知识. 而多视图的三维重建(类似人的双目定位)相对比较容易, 其方法是先对摄像机进行标定, 即计算出摄像机的图象坐标系与世界坐标系的关系.然后利用多个二维图象中的信息重建出三维信息。物体三维重建是计算机辅助几何设计(CAGD)、计算机图形学(CG)、计算机动画、计算机视觉、医学图像处理、科学计算和虚拟现实、数字媒体创作等领域的共性科学问题和核心技术。在计算机内生成物体三维表示主要有两类方法。一类是使用几何建模软件通过人机交互生成人为控制下的物体三维几何模型,另一类是通过一定的手段获取真实物体的几何形状。前者实现技术已经十分成熟,现有若干软件支持,比如:3DMAX、Maya、AutoCAD、UG等等,它们一般使用具有数学表达式的曲线曲面表示几何形状。后者一般称为三维重建过程,三维重建是指利用二维投影恢复物体三维信息(形状等)的数学过程和计算机技术,包括数据获取、预处理、点云拼接和特征分析等步骤。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日