Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning

Retrieval-augmented generation (RAG) has become a standard mechanism for grounding language models in external knowledge, yet conventional retrieval based on lexical or semantic similarity is poorly suited for complex reasoning tasks: a semantically similar problem may demand an entirely different solution strategy, while a superficially different problem may share the same underlying reasoning pattern. We propose Retrieval-Augmented Reinforcement Fine-Tuning (RA-RFT), a post-training framework that teaches language models to reason by analogy. RA-RFT uses gold-relevance distillation to train a retriever that ranks contexts by expected reasoning benefit rather than semantic overlap, and then fine-tunes the policy model via reinforcement fine-tuning methods with retrieved analogous demonstrations, so the model learns to leverage reasoning traces under verifiable outcome rewards. We further analyze the diversity of retrieved contexts and find that reasoning-aware retrieval surfaces complementary solution strategies that provide distinct reasoning scaffolds for individual problems. Across challenging mathematical reasoning benchmarks, RA-RFT consistently outperforms standard reinforcement fine-tuning methods. For example, it improves AIME 2025 average@32 accuracy by 7.1 and 2.8 points over GRPO for Qwen3-1.7B and Qwen3-4B respectively -- suggesting that reasoning-aware retrieval is a complementary axis of improvement and orthogonal to advances in reward design or training curricula.

翻译：检索增强生成（RAG）已成为将语言模型锚定于外部知识的标准机制，然而，基于词汇或语义相似性的传统检索方法难以胜任复杂推理任务：语义相似的问题可能需要完全不同的解决策略，而表面不同的问题却可能共享相同的底层推理模式。为此，我们提出检索增强强化微调（RA-RFT）——一种后训练框架，旨在教会语言模型通过类比进行推理。RA-RFT采用黄金相关性蒸馏技术训练检索器，使其根据预期推理收益而非语义重叠对上下文进行排序；随后通过强化微调方法，利用检索到的类比范例对策略模型进行精调，使模型学会在可验证结果奖励的引导下利用推理轨迹。我们进一步分析了检索上下文的多样性，发现推理感知的检索能够发现互补的解决策略，为每个问题提供不同的推理支架。在具有挑战性的数学推理基准测试中，RA-RFT持续优于标准强化微调方法。例如，在AIME 2025的average@32准确率上，RA-RFT相比于GRPO在Qwen3-1.7B和Qwen3-4B上分别提升了7.1和2.8个百分点——这表明推理感知的检索是另一个可与奖励设计或训练课程改进正交的优化维度。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【AAAI2026】TruthfulRAG：基于知识图谱解决检索增强生成中的事实层冲突

专知会员服务

24+阅读 · 2025年11月15日

面向大型推理模型的强化学习综述

专知会员服务

30+阅读 · 2025年9月11日

【SIGIR2025教程】动态与参数化检索增强生成

专知会员服务

17+阅读 · 2025年7月14日

检索增强生成(RAG)与推理的协同作用：一项系统综述

专知会员服务

34+阅读 · 2025年4月27日