Bag of Views: An Appearance-based Approach to Next-Best-View Planning for 3D Reconstruction

UAV-based intelligent data acquisition for 3D reconstruction and monitoring of infrastructure has been experiencing an increasing surge of interest due to the recent advancements in image processing and deep learning-based techniques. View planning is an essential part of this task that dictates the information capture strategy and heavily impacts the quality of the 3D model generated from the captured data. Recent methods have used prior knowledge or partial reconstruction of the target to accomplish view planning for active reconstruction; the former approach poses a challenge for complex or newly identified targets while the latter is computationally expensive. In this work, we present Bag-of-Views (BoV), a fully appearance-based model used to assign utility to the captured views for both offline dataset refinement and online next-best-view (NBV) planning applications targeting the task of 3D reconstruction. With this contribution, we also developed the View Planning Toolbox (VPT), a lightweight package for training and testing machine learning-based view planning frameworks, custom view dataset generation of arbitrary 3D scenes, and 3D reconstruction. Through experiments which pair a BoV-based reinforcement learning model with VPT, we demonstrate the efficacy of our model in reducing the number of required views for high-quality reconstructions in dataset refinement and NBV planning.

翻译：基于无人机的智能数据采集在基础设施三维重建与监测领域正受到日益广泛的关注，这得益于图像处理与深度学习技术的进步。视角规划是该任务的核心环节，它决定了信息采集策略，并深刻影响着所采集数据生成的三维模型质量。现有方法或依赖先验知识，或利用目标的部分重建结果来实现主动重建的视角规划：前者对复杂或新识别目标构成挑战，后者则计算成本高昂。本研究提出"视角集"（Bag-of-Views, BoV）——一种全外观模型，用于为已采集视角分配效用值，可应用于离线数据集优化与面向三维重建的在线下一最佳视角（NBV）规划。基于此贡献，我们还开发了视角规划工具箱（View Planning Toolbox, VPT），这是一个轻量级工具包，支持基于机器学习的视角规划框架的训练与测试、任意三维场景的自定义视角数据集生成以及三维重建。通过将基于BoV的强化学习模型与VPT结合进行实验，我们证明了该模型在数据集优化与NBV规划中，能有效减少高质量重建所需的视角数量。

相关内容

三维重建

关注 1174

在计算机视觉中, 三维重建是指根据单视图或者多视图的图像重建三维信息的过程. 由于单视频的信息不完全,因此三维重建需要利用经验知识. 而多视图的三维重建(类似人的双目定位)相对比较容易, 其方法是先对摄像机进行标定, 即计算出摄像机的图象坐标系与世界坐标系的关系.然后利用多个二维图象中的信息重建出三维信息。物体三维重建是计算机辅助几何设计(CAGD)、计算机图形学(CG)、计算机动画、计算机视觉、医学图像处理、科学计算和虚拟现实、数字媒体创作等领域的共性科学问题和核心技术。在计算机内生成物体三维表示主要有两类方法。一类是使用几何建模软件通过人机交互生成人为控制下的物体三维几何模型,另一类是通过一定的手段获取真实物体的几何形状。前者实现技术已经十分成熟,现有若干软件支持,比如:3DMAX、Maya、AutoCAD、UG等等,它们一般使用具有数学表达式的曲线曲面表示几何形状。后者一般称为三维重建过程,三维重建是指利用二维投影恢复物体三维信息(形状等)的数学过程和计算机技术,包括数据获取、预处理、点云拼接和特征分析等步骤。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日