Recent large language models (LLMs) have demonstrated remarkable performance on a variety of natural language processing (NLP) tasks, leading to intense excitement about their applicability across various domains. Unfortunately, recent work has also shown that LLMs are unable to perform accurate reasoning nor solve planning problems, which may limit their usefulness for robotics-related tasks. In this work, our central question is whether LLMs are able to translate goals specified in natural language to a structured planning language. If so, LLM can act as a natural interface between the planner and human users; the translated goal can be handed to domain-independent AI planners that are very effective at planning. Our empirical results on GPT 3.5 variants show that LLMs are much better suited towards translation rather than planning. We find that LLMs are able to leverage commonsense knowledge and reasoning to furnish missing details from under-specified goals (as is often the case in natural language). However, our experiments also reveal that LLMs can fail to generate goals in tasks that involve numerical or physical (e.g., spatial) reasoning, and that LLMs are sensitive to the prompts used. As such, these models are promising for translation to structured planning languages, but care should be taken in their use.
翻译:近期的大语言模型(LLMs)在多种自然语言处理(NLP)任务中展现出卓越性能,引发了学界对其跨领域适用性的广泛关注。然而,最新研究表明,LLMs 既无法进行精确推理,也不能有效解决规划问题,这可能限制其在机器人相关任务中的应用潜力。本研究聚焦的核心问题是:LLMs 能否将自然语言表述的目标转化为结构化规划语言?若能实现,LLM 可充当规划器与人类用户之间的自然交互界面,将翻译后的目标传递给高效的领域无关 AI 规划器。基于 GPT 3.5 变体的实验结果表明,LLMs 在翻译任务上远优于规划任务。我们发现,LLMs 能够利用常识知识与推理能力补全欠指定目标(这在自然语言中常见)中缺失的细节。但实验同时揭示,LLMs 在涉及数值或物理(如空间)推理的任务中可能生成失败,且其表现对提示词高度敏感。因此,这类模型在转化为结构化规划语言方面具有潜力,但在实际应用中需谨慎对待。