GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction

While recent advances in neural radiance field enable realistic digitization for large-scale scenes, the image-capturing process is still time-consuming and labor-intensive. Previous works attempt to automate this process using the Next-Best-View (NBV) policy for active 3D reconstruction. However, the existing NBV policies heavily rely on hand-crafted criteria, limited action space, or per-scene optimized representations. These constraints limit their cross-dataset generalizability. To overcome them, we propose GenNBV, an end-to-end generalizable NBV policy. Our policy adopts a reinforcement learning (RL)-based framework and extends typical limited action space to 5D free space. It empowers our agent drone to scan from any viewpoint, and even interact with unseen geometries during training. To boost the cross-dataset generalizability, we also propose a novel multi-source state embedding, including geometric, semantic, and action representations. We establish a benchmark using the Isaac Gym simulator with the Houses3K and OmniObject3D datasets to evaluate this NBV policy. Experiments demonstrate that our policy achieves a 98.26% and 97.12% coverage ratio on unseen building-scale objects from these datasets, respectively, outperforming prior solutions.

翻译：尽管神经辐射场的最新进展使得大规模场景的真实数字化成为可能，但图像采集过程仍然耗时且费力。先前的研究尝试使用最优下一视点策略来自动化主动三维重建过程。然而，现有的最优下一视点策略严重依赖于手工设计的准则、受限的动作空间或针对每个场景优化的表示。这些限制制约了其跨数据集的泛化能力。为克服这些限制，我们提出了GenNBV，一种端到端的可泛化最优下一视点策略。我们的策略采用基于强化学习的框架，并将典型的受限动作空间扩展至五维自由空间。这使得我们的智能体无人机能够从任意视点进行扫描，甚至在训练过程中与未见过的几何结构进行交互。为了提升跨数据集的泛化能力，我们还提出了一种新颖的多源状态嵌入，包括几何、语义和动作表示。我们使用Isaac Gym模拟器与Houses3K和OmniObject3D数据集建立了一个基准来评估此最优下一视点策略。实验表明，我们的策略在这两个数据集上对未见过的建筑尺度物体分别达到了98.26%和97.12%的覆盖率，优于先前的解决方案。

相关内容

三维重建

关注 1174

在计算机视觉中, 三维重建是指根据单视图或者多视图的图像重建三维信息的过程. 由于单视频的信息不完全,因此三维重建需要利用经验知识. 而多视图的三维重建(类似人的双目定位)相对比较容易, 其方法是先对摄像机进行标定, 即计算出摄像机的图象坐标系与世界坐标系的关系.然后利用多个二维图象中的信息重建出三维信息。物体三维重建是计算机辅助几何设计(CAGD)、计算机图形学(CG)、计算机动画、计算机视觉、医学图像处理、科学计算和虚拟现实、数字媒体创作等领域的共性科学问题和核心技术。在计算机内生成物体三维表示主要有两类方法。一类是使用几何建模软件通过人机交互生成人为控制下的物体三维几何模型,另一类是通过一定的手段获取真实物体的几何形状。前者实现技术已经十分成熟,现有若干软件支持,比如:3DMAX、Maya、AutoCAD、UG等等,它们一般使用具有数学表达式的曲线曲面表示几何形状。后者一般称为三维重建过程,三维重建是指利用二维投影恢复物体三维信息(形状等)的数学过程和计算机技术,包括数据获取、预处理、点云拼接和特征分析等步骤。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日