Large Language Models (LLMs) trained using massive text datasets have recently shown promise in generating action plans for robotic agents from high level text queries. However, these models typically do not consider the robot's environment, resulting in generated plans that may not actually be executable, due to ambiguities in the planned actions or environmental constraints. In this paper, we propose an approach to generate environmentally-aware action plans that agents are better able to execute. Our approach involves integrating environmental objects and object relations as additional inputs into LLM action plan generation to provide the system with an awareness of its surroundings, resulting in plans where each generated action is mapped to objects present in the scene. We also design a novel scoring function that, along with generating the action steps and associating them with objects, helps the system disambiguate among object instances and take into account their states. We evaluated our approach using the VirtualHome simulator and the ActivityPrograms knowledge base and found that action plans generated from our system had a 310% improvement in executability and a 147% improvement in correctness over prior work. The complete code and a demo of our method is publicly available at https://github.com/hri-ironlab/scene_aware_language_planner.
翻译:基于大规模文本数据集训练的大型语言模型(LLMs)近期在高层次文本查询生成机器人行动方案方面展现出潜力。然而,这些模型通常未考虑机器人的环境,导致生成的方案可能因计划动作的模糊性或环境约束而无法实际执行。本文提出一种方法,用于生成环境感知的行动方案,使智能体能够更好地执行。我们的方法将环境对象及其关系作为额外输入融入LLM行动方案生成过程,赋予系统对环境感知能力,从而使得生成的每个动作都对应场景中的实际对象。我们还设计了一种新颖的评分函数,该函数在生成动作步骤并将其与对象关联的同时,帮助系统消除对象实例间的歧义并考虑其状态。我们使用VirtualHome模拟器和ActivityPrograms知识库评估了该方法,发现与先前研究相比,我们的系统生成的行动方案在执行性上提升了310%,正确性提升了147%。完整代码及演示已公开于https://github.com/hri-ironlab/scene_aware_language_planner。