While recent advances in neural radiance field enable realistic digitization for large-scale scenes, the image-capturing process is still time-consuming and labor-intensive. Previous works attempt to automate this process using the Next-Best-View (NBV) policy for active 3D reconstruction. However, the existing NBV policies heavily rely on hand-crafted criteria, limited action space, or per-scene optimized representations. These constraints limit their cross-dataset generalizability. To overcome them, we propose GenNBV, an end-to-end generalizable NBV policy. Our policy adopts a reinforcement learning (RL)-based framework and extends typical limited action space to 5D free space. It empowers our agent drone to scan from any viewpoint, and even interact with unseen geometries during training. To boost the cross-dataset generalizability, we also propose a novel multi-source state embedding, including geometric, semantic, and action representations. We establish a benchmark using the Isaac Gym simulator with the Houses3K and OmniObject3D datasets to evaluate this NBV policy. Experiments demonstrate that our policy achieves a 98.26% and 97.12% coverage ratio on unseen building-scale objects from these datasets, respectively, outperforming prior solutions.
翻译:尽管神经辐射场的最新进展能够实现大规模场景的高保真数字化,但其图像采集过程仍耗时耗力。现有研究尝试利用主动三维重建中的下一最佳视角(NBV)策略自动化该流程,但现有NBV策略严重依赖人工设计的准则、有限的动作空间或逐场景优化的表征,这些限制阻碍了其跨数据集的泛化能力。为解决上述问题,我们提出GenNBV——一种端到端可泛化的NBV策略。该策略采用基于强化学习(RL)的框架,将典型的受限动作空间扩展至5D自由空间,使智能体无人机能从任意视角扫描,甚至能交互训练中未见过的几何结构。为提升跨数据集泛化性,我们创新性地提出多源状态嵌入模块,融合几何、语义与动作表征。基于Isaac Gym模拟器与Houses3K及OmniObject3D数据集构建的基准测试表明,本策略在数据集未见的建筑尺度物体上分别达到98.26%与97.12%的覆盖率,显著优于现有方案。