Recent advances in Large Language Models (LLMs) have led to an emergent ability of chain-of-thought (CoT) prompting, a prompt reasoning strategy that adds intermediate rationale steps between questions and answers to construct prompts. Conditioned on these prompts, LLMs can effectively learn in context to generate rationales that lead to more accurate answers than when answering the same question directly. To design LLM prompts, one important setting, called demonstration selection, considers selecting demonstrations from an example bank. Existing methods use various heuristics for this selection, but for CoT prompting, which involves unique rationales, it is essential to base the selection upon the intrinsic skills that CoT rationales need, for instance, the skills of addition or subtraction for math word problems. To address this requirement, we introduce a novel approach named Reasoning Skill Discovery (RSD) that use unsupervised learning to create a latent space representation of rationales, called a reasoning skill. Simultaneously, RSD learns a reasoning policy to determine the required reasoning skill for a given question. This can then guide the selection of examples that demonstrate the required reasoning skills. Our approach offers several desirable properties: it is (1) theoretically grounded, (2) sample-efficient, requiring no LLM inference or manual prompt design, and (3) LLM-agnostic. Empirically, RSD outperforms existing methods by up to 6% in terms of the answer accuracy across multiple reasoning tasks.
翻译:近期大语言模型(LLMs)的进展催生了链式思维提示这一新兴能力,这是一种在问题与答案之间添加中间推理步骤以构建提示的提示推理策略。基于这些提示,LLMs能通过上下文有效学习生成推理过程,从而比直接回答问题获得更准确的答案。在设计LLM提示时,一个重要设置称为示例选择,需从示例库中选取示例。现有方法对此使用多种启发式策略,但对于包含独特推理过程的链式思维提示,必须基于链式思维推理所需的内在技能(如数学文字题中的加减法技能)进行选择。为应对这一需求,我们提出名为推理技能发现(RSD)的新方法,通过无监督学习创建推理过程的潜在空间表征——即推理技能。同时,RSD学习一种推理策略来确定给定问题所需的推理技能,进而指导选择展示所需推理技能的示例。本方法具备以下理想特性:(1)理论严谨,(2)样本高效,无需LLM推理或手动提示设计,(3)与LLM无关。实验表明,RSD在多项推理任务的答案准确率上较现有方法最高提升6%。