Large language models (LLMs) have demonstrated strong reasoning capabilities on math and coding, but frequently fail on symbolic classical planning tasks. Our studies, as well as prior work, show that LLM-generated plans routinely violate domain constraints given in their instructions (e.g., walking through walls). To address this failure, we propose iteratively augmenting instructions with Localized In-Context Learning (L-ICL) demonstrations: targeted corrections for specific failing steps. Specifically, L-ICL identifies the first constraint violation in a trace and injects a minimal input-output example giving the correct behavior for the failing step. Our proposed technique of L-ICL is much effective than explicit instructions or traditional ICL, which adds complete problem-solving trajectories, and many other baselines. For example, on an 8x8 gridworld, L-ICL produces valid plans 89% of the time with only 60 training examples, compared to 59% for the best baseline, an increase of 30%. L-ICL also shows dramatic improvements in other domains (gridworld navigation, mazes, Sokoban, and BlocksWorld), and on several LLM architectures.
翻译:大语言模型(LLMs)在数学与编程任务上展现出强大的推理能力,但在符号化经典规划任务中却频繁失效。我们的研究及先前工作表明,LLM生成的规划方案常常违反指令中给定的领域约束(例如穿墙而行)。为解决这一问题,我们提出通过局部上下文学习演示对指令进行迭代增强:针对具体失败步骤进行定向修正。具体而言,该方法首先识别轨迹中首次出现的约束违反,随后注入一个最小化的输入-输出示例,为失败步骤提供正确行为模式。我们提出的局部上下文学习方法相较于显式指令或传统上下文学习(后者需添加完整的问题求解轨迹)及其他多种基线方法具有显著优势。例如,在8x8网格世界中,仅需60个训练示例,该方法即可生成有效规划方案的成功率达89%,而最佳基线方法仅为59%,提升了30个百分点。该方法在网格世界导航、迷宫、推箱子游戏及积木世界等多个领域,以及多种大语言模型架构上均展现出显著性能提升。