Recent advances in vision-language models have made zero-shot navigation feasible, enabling robots to follow natural language instructions without requiring labeling. However, existing methods that explicitly store language vectors in grid or node-based maps struggle to scale to large environments due to excessive memory requirements and limited resolution for fine-grained planning. We introduce LAMP (Language Map), a novel neural language field-based navigation framework that learns a continuous, language-driven map and directly leverages it for fine-grained path generation. Unlike prior approaches, our method encodes language features as an implicit neural field rather than storing them explicitly at every location. By combining this implicit representation with a sparse graph, LAMP supports efficient coarse path planning and then performs gradient-based optimization in the learned field to refine poses near the goal. This coarse-to-fine pipeline, language-driven, gradient-guided optimization is the first application of an implicit language map for precise path generation. This refinement is particularly effective at selecting goal regions not directly observed by leveraging semantic similarities in the learned feature space. To further enhance robustness, we adopt a Bayesian framework that models embedding uncertainty via the von Mises-Fisher distribution, thereby improving generalization to unobserved regions. To scale to large environments, LAMP employs a graph sampling strategy that prioritizes spatial coverage and embedding confidence, retaining only the most informative nodes and substantially reducing computational overhead. Our experimental results, both in NVIDIA Isaac Sim and on a real multi-floor building, demonstrate that LAMP outperforms existing explicit methods in both memory efficiency and fine-grained goal-reaching accuracy.
翻译:近年来,视觉语言模型的进展使得零样本导航成为可能,使机器人能够遵循自然语言指令而无需标注。然而,现有方法将语言向量显式存储在基于网格或节点的地图中,由于内存需求过高以及细粒度规划的分辨率有限,难以扩展至大规模环境。我们提出LAMP(语言地图),一种基于神经语言场的新型导航框架,它学习一个连续的语言驱动地图,并直接利用该地图进行细粒度路径生成。与先前方法不同,我们的方法将语言特征编码为隐式神经场,而非在每个位置显式存储。通过将此隐式表示与稀疏图相结合,LAMP支持高效的粗粒度路径规划,随后在学习的场中进行基于梯度的优化以细化目标附近的位姿。这种由语言驱动、梯度引导的粗到细优化流程,是首次将隐式语言地图应用于精确路径生成。该细化过程特别擅长通过利用学习特征空间中的语义相似性来选择未直接观测到的目标区域。为进一步增强鲁棒性,我们采用贝叶斯框架,通过冯·米塞斯-费希尔分布对嵌入不确定性进行建模,从而提升对未观测区域的泛化能力。为适应大规模环境,LAMP采用一种图采样策略,优先考虑空间覆盖度和嵌入置信度,仅保留信息量最高的节点,显著降低了计算开销。我们在NVIDIA Isaac Sim仿真环境和真实多层建筑中的实验结果表明,LAMP在内存效率和细粒度目标到达精度方面均优于现有的显式方法。