Recent advances in metric, semantic, and topological mapping have equipped autonomous robots with semantic concept grounding capabilities to interpret natural language tasks. This work aims to leverage these new capabilities with an efficient task planning algorithm for hierarchical metric-semantic models. We consider a scene graph representation of the environment and utilize a large language model (LLM) to convert a natural language task into a linear temporal logic (LTL) automaton. Our main contribution is to enable optimal hierarchical LTL planning with LLM guidance over scene graphs. To achieve efficiency, we construct a hierarchical planning domain that captures the attributes and connectivity of the scene graph and the task automaton, and provide semantic guidance via an LLM heuristic function. To guarantee optimality, we design an LTL heuristic function that is provably consistent and supplements the potentially inadmissible LLM guidance in multi-heuristic planning. We demonstrate efficient planning of complex natural language tasks in scene graphs of virtualized real environments.
翻译:近年来,度量、语义和拓扑映射技术的进步,使自主机器人具备了语义概念接地能力,能够解析自然语言任务。本文旨在利用这些新能力,提出一种针对层级度量-语义模型的高效任务规划算法。我们采用环境的场景图表示,并利用大语言模型将自然语言任务转换为线性时序逻辑自动机。本文的主要贡献在于,实现大语言模型引导下场景图上最优层级线性时序逻辑规划。为提升效率,我们构建了一个层级规划域,该域捕获了场景图与任务自动机的属性及连通性,并通过大语言模型启发式函数提供语义引导。为保证最优性,我们设计了一个可证明一致的线性时序逻辑启发式函数,在多启发式规划中补充了潜在不可采纳的大语言模型引导。我们在虚拟化真实环境的场景图中演示了复杂自然语言任务的高效规划。