We develop an approach for active semantic perception, which refers to using the semantics of the scene for tasks such as exploration. We build a compact, multi-layer scene graph that can represent large, complex indoor environments at various levels of abstraction, e.g., nodes corresponding to rooms, objects, walls, windows etc., as well as fine-grained details of their geometry. We develop a procedure based on large language models (LLMs) to sample new plausible scene graphs of unobserved regions that are consistent with partial observations of the scene. We develop a procedure to compute the information gain of a potential waypoint upon this scene graph to enable sophisticated spatial reasoning: for example, of the two doors that lead out of the living room, one probably leads to the kitchen and the other to the bedroom. We evaluate our approach in realistic 3D indoor apartments in simulation and also on a Unitree Go 2 robot in the real world. Qualitative and quantitative analysis shows that our approach can pin down high-level and low-level semantic information in the environment quickly and more accurately than existing approaches.
翻译:我们提出了一种主动语义感知方法,该方法利用场景的语义信息来执行探索等任务。我们构建了一个紧凑的多层次场景图,能够以不同抽象级别表示复杂的大型室内环境,例如对应于房间、物体、墙壁、窗户等的节点,以及其几何形状的细粒度细节。我们开发了一种基于大语言模型的程序,用于对未观测区域中与场景部分观测一致的新可能场景图进行采样。我们提出了一种计算潜在航点在此场景图上信息增益的方法,以实现复杂的空间推理:例如,在从客厅引出的两扇门中,一扇可能通往厨房,另一扇通往卧室。我们在仿真环境中的逼真三维室内公寓以及现实世界中的宇树Go 2机器人上评估了我们的方法。定性和定量分析表明,与现有方法相比,我们的方法能够更快速、更准确地定位环境中高层和低层的语义信息。