Commonly available prior information, such as BIM models, floor plans, and remote sensing images, can provide valuable geometric and semantic context for autonomous robotic systems. In this paper, we treat observations from fixed external RGB cameras as Common Prior Maps (CPMs): wide-field views of the environment that initialize a semantic and geometric scene prior before any robot motion begins. We present an RGB-only framework for active, incremental 3D scene graph (3DSG) generation that seamlessly fuses observations from both onboard robot cameras and fixed external cameras within a single hardware-agnostic pipeline. By relying solely on RGB observations processed by a feed-forward 3D reconstruction model, the system treats all cameras - onboard or external - identically, requiring no hardware modifications. A graph-based active semantic exploration framework then directly leverages the partial scene graph to guide the robot toward regions of high semantic uncertainty, progressively completing and refining the prior. Experiments demonstrate that bootstrapping the scene graph with even a single external camera increases initial object recall by up to +79%, and that the richer context of the prior significantly improves the efficiency of subsequent active exploration.
翻译:常见的先验信息,如BIM模型、楼层平面图和遥感影像,可为自主机器人系统提供宝贵的几何与语义上下文。本文将固定外部RGB相机的观测视为公共先验地图:在机器人开始运动前,这些对环境的大视场观测即初始化了语义和几何场景先验。我们提出了一种基于RGB的主动增量式3D场景图生成框架,该框架通过单一硬件无关的流水线,无缝融合机载机器人相机与固定外部相机的观测。通过仅依赖前馈3D重建模型处理的RGB观测,系统对机载与外部相机一视同仁,无需任何硬件改造。基于图的主动语义探索框架随后直接利用部分场景图引导机器人前往语义不确定性高的区域,逐步完善并优化先验。实验表明,即便仅使用单个外部相机引导场景图生成,初始物体召回率即可提升高达+79%,而更丰富的先验上下文可显著提升后续主动探索的效率。