LLMs are being increasingly used for planning-style tasks, but their capabilities for planning and reasoning are poorly understood. We present a novel method for automatically converting planning benchmarks written in PDDL into textual descriptions and offer a benchmark dataset created with our method. We show that while the best LLM planners do well on many planning tasks, others remain out of reach of current methods.
翻译:随着大语言模型被越来越多地用于规划类任务,但其规划与推理能力仍未被充分理解。我们提出了一种将PDDL编写的规划基准自动转换为文本描述的新方法,并利用该方法创建了一个基准数据集。研究表明,尽管最先进的LLM规划器在多项规划任务上表现优异,但仍有部分任务超出了现有方法的能力范围。