Large Language Models have been shown to fail to create executable and verifiable plans in grounded environments. An emerging line of work shows success in using LLM as a formalizer to generate a formal representation (e.g., PDDL) of the planning domain, which can be deterministically solved to find a plan. We systematically evaluate this methodology while bridging some major gaps. While previous work only generates a partial PDDL representation given templated and thus unrealistic environment descriptions, we generate the complete representation given descriptions of various naturalness levels. Among an array of observations critical to improve LLMs' formal planning ability, we note that large enough models can effectively formalize descriptions as PDDL, outperforming those directly generating plans, while being robust to lexical perturbation. As the descriptions become more natural-sounding, we observe a decrease in performance and provide detailed error analysis.
翻译:已有研究表明,大型语言模型在具身环境中无法生成可执行且可验证的规划方案。近期一系列研究成功利用LLM作为形式化工具,生成规划领域的规范表示(如PDDL),并通过确定性求解获得规划方案。本研究系统评估了该方法,并弥补了若干重要缺陷。现有研究仅能根据模板化(因而非真实)的环境描述生成部分PDDL表示,而我们的方法可在不同自然度描述下生成完整表示。在提升LLM形式化规划能力的关键观察中,我们发现足够大规模的模型能有效将描述形式化为PDDL,其表现优于直接生成规划的模型,且对词汇扰动具有鲁棒性。随着描述语言趋向自然化,我们观察到模型性能下降,并对此进行了详细的误差分析。