A Framework for Neurosymbolic Robot Action Planning using Large Language Models

from arxiv, 36 pages, 7 figures, 2 tables. Updated according to reviewers' comments. Previous title: A Framework to Generate Neurosymbolic PDDL-compliant Planners

Symbolic task planning is a widely used approach to enforce robot autonomy due to its ease of understanding and deployment in robot architectures. However, techniques for symbolic task planning are difficult to scale in real-world, human-robot collaboration scenarios because of the poor performance in complex planning domains or when frequent re-planning is needed. We present a framework, Teriyaki, specifically aimed at bridging the gap between symbolic task planning and machine learning approaches. The rationale is training Large Language Models (LLMs), namely GPT-3, into a neurosymbolic task planner compatible with the Planning Domain Definition Language (PDDL), and then leveraging its generative capabilities to overcome a number of limitations inherent to symbolic task planners. Potential benefits include (i) a better scalability in so far as the planning domain complexity increases, since LLMs' response time linearly scales with the combined length of the input and the output, and (ii) the ability to synthesize a plan action-by-action instead of end-to-end, making each action available for execution as soon as it is generated instead of waiting for the whole plan to be available, which in turn enables concurrent planning and execution. Recently, significant efforts have been devoted by the research community to evaluate the cognitive capabilities of LLMs, with alternate successes. Instead, with Teriyaki we aim to provide an overall planning performance comparable to traditional planners in specific planning domains, while leveraging LLMs capabilities to build a look-ahead predictive planning model. Preliminary results in selected domains show that our method can: (i) solve 95.5% of problems in a test data set of 1,000 samples; (ii) produce plans up to 13.5% shorter than a traditional symbolic planner; (iii) reduce average overall waiting times for a plan availability by up to 61.4%

翻译：符号任务规划因其易于理解且便于在机器人架构中部署，已成为一种广泛采用的机器人自主性实现方法。然而，在现实世界的人机协作场景中，当规划领域复杂或需要频繁重新规划时，符号任务规划技术的性能表现不佳，难以扩展。我们提出了一个名为Teriyaki的框架，专门致力于弥合符号任务规划与机器学习方法之间的鸿沟。其核心理念是将大型语言模型（LLMs），特别是GPT-3，训练成一个与规划领域定义语言（PDDL）兼容的神经符号任务规划器，进而利用其生成能力来克服符号任务规划器固有的若干局限性。潜在优势包括：（i）随着规划领域复杂度的增加，其可扩展性更好，因为LLMs的响应时间与输入和输出的总长度呈线性关系；（ii）能够逐动作合成计划，而非端到端生成，使得每个动作一经生成即可立即执行，而无需等待整个计划完成，从而实现规划与执行的并发。近期，研究界投入大量努力评估LLMs的认知能力，成果喜忧参半。相比之下，Teriyaki的目标是在特定规划领域提供与传统规划器相当的总体规划性能，同时利用LLMs的能力构建前瞻性预测规划模型。在选定领域的初步结果表明，我们的方法能够：（i）在包含1,000个样本的测试数据集中解决95.5%的问题；（ii）生成的计划比传统符号规划器缩短最多13.5%；（iii）将计划可用的平均总体等待时间减少最多61.4%。