Symbolic planners can discover a sequence of actions from initial to goal states given expert-defined, domain-specific logical action semantics. Large Language Models (LLMs) can directly generate such sequences, but limitations in reasoning and state-tracking often result in plans that are insufficient or unexecutable. We propose Predicting Semantics of Actions with Language Models (PSALM), which automatically learns action semantics by leveraging the strengths of both symbolic planners and LLMs. PSALM repeatedly proposes and executes plans, using the LLM to partially generate plans and to infer domain-specific action semantics based on execution outcomes. PSALM maintains a belief over possible action semantics that is iteratively updated until a goal state is reached. Experiments on 7 environments show that when learning just from one goal, PSALM boosts plan success rate from 36.4% (on Claude-3.5) to 100%, and explores the environment more efficiently than prior work to infer ground truth domain action semantics.
翻译:符号规划器能够在给定专家定义的领域特定逻辑动作语义的前提下,发现从初始状态到目标状态的动作序列。大型语言模型(LLMs)可以直接生成此类序列,但其在推理和状态跟踪方面的局限性常常导致生成的规划不充分或不可执行。我们提出了基于语言模型的动作语义预测方法(PSALM),该方法通过结合符号规划器和LLMs的优势,自动学习动作语义。PSALM反复提出并执行规划,利用LLM部分生成规划,并根据执行结果推断领域特定的动作语义。PSALM维护一个关于可能动作语义的信念,该信念被迭代更新直至达到目标状态。在7个环境上的实验表明,当仅从一个目标学习时,PSALM将规划成功率从36.4%(基于Claude-3.5)提升至100%,并且比先前工作更高效地探索环境以推断出真实的领域动作语义。