Procedural planning, which entails decomposing a high-level goal into a sequence of temporally ordered steps, is an important yet intricate task for machines. It involves integrating common-sense knowledge to reason about complex contextualized situations that are often counterfactual, e.g. "scheduling a doctor's appointment without a phone". While current approaches show encouraging results using large language models (LLMs), they are hindered by drawbacks such as costly API calls and reproducibility issues. In this paper, we advocate planning using smaller language models. We present PlaSma, a novel two-pronged approach to endow small language models with procedural knowledge and (counterfactual) planning capabilities. More concretely, we develop symbolic procedural knowledge distillation to enhance the implicit knowledge in small language models and an inference-time algorithm to facilitate more structured and accurate reasoning. In addition, we introduce a novel task, Counterfactual Planning, that requires a revision of a plan to cope with a counterfactual situation. In both the original and counterfactual setting, we show that orders-of-magnitude smaller models (770M-11B parameters) can compete and often surpass their larger teacher models' capabilities.
翻译:程序规划——即将高层目标分解为一系列时序有序步骤的过程——是机器面临的重要而复杂的任务。该任务需要整合常识性知识来推理复杂的语境化情境(通常为反事实情境),例如"在没有电话的情况下预约医生"。当前方法虽然利用大型语言模型取得了令人鼓舞的结果,但存在API调用成本高昂和可重复性不足等缺陷。本文主张使用更小的语言模型进行规划。我们提出PlaSma,这是一种新颖的双轨方法,旨在赋予小型语言模型程序性知识和(反事实)规划能力。具体而言,我们开发了符号化程序性知识蒸馏技术以增强小型语言模型中的隐式知识,并设计了推理时算法促进更结构化、更准确的推理。此外,我们引入了一项新任务——反事实规划,该任务要求修改已有计划以应对反事实情境。实验表明,在原始设定与反事实设定中,参数规模小数个数量级(7.7亿-110亿参数)的模型能够与更大型的教师模型竞争,且通常能超越其能力。