Reasoning in Large Language Models (LLMs) often suffers from inefficient long chain-of-thought traces with redundant self-exploration and validation, which inflate computational costs and even degrade performance. Inspired by human reasoning patterns where people solve new problems by leveraging past related cases to constrain search spaces and reduce trial-and-error, we propose Precedent Informed Reasoning (PIR) transforming LRMs'reasoning paradigm from exhaustive self-exploration to guided learning from precedents. PIR addresses two key challenges: what precedents to adopt and how to utilize them. First, Adaptive Precedent Selection (APS) constructs, for each question and LRM, a compact set of precedents that are both semantically related and informative for the model. It ranks examples by a joint score with semantic similarity and model perplexity, then adapts the amount of precedents to maximize perplexity reduction. Second, Test-time Experience Internalization (TEI) is treated as the test-time learning on precedent-informed instruction, updating lightweight adapters to internalize solution patterns and use them as a prior during subsequent reasoning. Experiments across mathematical reasoning, scientific QA, and code generation demonstrate that PIR consistently shortens reasoning traces while maintaining or improving final accuracy across LLMs, yielding outstanding accuracy-efficiency trade-offs.
翻译:大型语言模型(LLM)的推理过程常因低效的长链思维轨迹而受到影响,这些轨迹包含冗余的自我探索和验证,不仅增加了计算成本,甚至可能降低性能。受人类推理模式的启发——人们通过借鉴过去相关案例来约束搜索空间、减少试错以解决新问题——我们提出了先例引导推理(PIR),将LRM的推理范式从详尽的自我探索转变为从先例中引导学习。PIR解决了两个关键挑战:采用何种先例以及如何利用它们。首先,自适应先例选择(APS)为每个问题和LRM构建一个紧凑的先例集合,这些先例既在语义上相关,又对模型具有信息价值。它通过结合语义相似度和模型困惑度的联合评分对示例进行排序,然后自适应地调整先例数量以最大化困惑度降低。其次,测试时经验内化(TEI)被视为对先例引导指令的测试时学习,通过更新轻量级适配器来内化解题模式,并将其作为后续推理的先验知识。在数学推理、科学问答和代码生成等任务上的实验表明,PIR能在保持或提升LLM最终准确率的同时持续缩短推理轨迹,实现了优异的准确率-效率权衡。