The Object Goal Navigation (ObjectNav) task challenges agents to locate a specified object in an unseen environment by imagining unobserved regions of the scene. Prior approaches rely on deterministic and discriminative models to complete semantic maps, overlooking the inherent uncertainty in indoor layouts and limiting their ability to generalize to unseen environments. In this work, we propose GOAL, a generative flow-based framework that models the semantic distribution of indoor environments by bridging observed regions with LLM-enriched full-scene semantic maps. During training, spatial priors inferred from large language models (LLMs) are encoded as two-dimensional Gaussian fields and injected into target maps, distilling rich contextual knowledge into the flow model and enabling more generalizable completions. Extensive experiments demonstrate that GOAL achieves state-of-the-art performance on MP3D and Gibson, and shows strong generalization in transfer settings to HM3D. Codes and pretrained models are available at https://github.com/Badi-Li/GOAL.
翻译:目标导航任务要求智能体在未知环境中通过想象场景未观测区域来定位指定物体。现有方法依赖确定性判别模型完成语义建图,忽略了室内布局固有的不确定性,限制了其在未见环境中的泛化能力。本研究提出GOAL框架——一种基于生成流模型的架构,通过桥接观测区域与LLM增强的全场景语义地图,对室内环境语义分布进行建模。在训练阶段,从大语言模型推断的空间先验知识被编码为二维高斯场并注入目标地图,将丰富的上下文知识蒸馏至流模型中,从而实现更具泛化能力的场景补全。大量实验表明,GOAL在MP3D和Gibson数据集上取得了最先进的性能,并在向HM3D迁移的场景中展现出强大的泛化能力。代码与预训练模型已发布于https://github.com/Badi-Li/GOAL。