ReasonCACHE: Teaching LLMs To Reason Without Weight Updates

Can Large language models (LLMs) learn to reason without any weight update and only through in-context learning (ICL)? ICL is strikingly sample-efficient, often learning from only a handful of demonstrations, but complex reasoning tasks typically demand many training examples to learn from. However, naively scaling ICL by adding more demonstrations breaks down at this scale: attention costs grow quadratically, performance saturates or degrades with longer contexts, and the approach remains a shallow form of learning. Due to these limitations, practitioners predominantly rely on in-weight learning (IWL) to induce reasoning. In this work, we show that by using Prefix Tuning, LLMs can learn to reason without overloading the context window and without any weight updates. We introduce $\textbf{ReasonCACHE}$, an instantiation of this mechanism that distills demonstrations into a fixed key-value cache. Empirically, across challenging reasoning benchmarks, including GPQA-Diamond, ReasonCACHE outperforms standard ICL and matches or surpasses IWL approaches. Further, it achieves this all while being more efficient across three key axes: data, inference cost, and trainable parameters. We also theoretically prove that ReasonCACHE can be strictly more expressive than low-rank weight update since the latter ties expressivity to input rank, whereas ReasonCACHE bypasses this constraint by directly injecting key-values into the attention mechanism. Together, our findings identify ReasonCACHE as a middle path between in-context and in-weight learning, providing a scalable algorithm for learning reasoning skills beyond the context window without modifying parameters. Our project page: https://reasoncache.github.io/

翻译：大型语言模型（LLMs）能否在不进行任何权重更新的情况下，仅通过上下文学习（ICL）学会推理？ICL具有惊人的样本效率，通常仅需少量示例即可学习，但复杂的推理任务通常需要大量训练样本。然而，简单地通过增加演示数量来扩展ICL在此规模下会失效：注意力成本呈二次方增长，性能在较长上下文下会饱和或下降，并且该方法仍是一种浅层学习形式。由于这些限制，实践者主要依赖权重内学习（IWL）来引导推理。在本工作中，我们证明通过使用前缀微调（Prefix Tuning），LLMs可以在不超载上下文窗口且无需任何权重更新的情况下学会推理。我们引入了$\textbf{ReasonCACHE}$，这是该机制的一个实例，它将演示提炼成一个固定的键值缓存。实证研究表明，在包括GPQA-Diamond在内的具有挑战性的推理基准测试中，ReasonCACHE的表现优于标准ICL，并达到或超越了IWL方法。此外，它在三个关键维度上均实现了更高的效率：数据、推理成本和可训练参数。我们还从理论上证明，ReasonCACHE的表达能力可以严格优于低秩权重更新，因为后者的表达能力受限于输入秩，而ReasonCACHE通过直接将键值注入注意力机制绕过了这一限制。总之，我们的研究结果表明，ReasonCACHE是上下文学习与权重内学习之间的一条中间路径，它提供了一种可扩展的算法，用于在上下文窗口之外学习推理技能而无需修改参数。项目页面：https://reasoncache.github.io/