Recursive Introspection: Teaching Language Model Agents How to Self-Improve

A central piece in enabling intelligent agentic behavior in foundation models is to make them capable of introspecting upon their behavior, reasoning, and correcting their mistakes as more computation or interaction is available. Even the strongest proprietary large language models (LLMs) do not quite exhibit the ability of continually improving their responses sequentially, even in scenarios where they are explicitly told that they are making a mistake. In this paper, we develop RISE: Recursive IntroSpEction, an approach for fine-tuning LLMs to introduce this capability, despite prior work hypothesizing that this capability may not be possible to attain. Our approach prescribes an iterative fine-tuning procedure, which attempts to teach the model how to alter its response after having executed previously unsuccessful attempts to solve a hard test-time problem, with optionally additional environment feedback. RISE poses fine-tuning for a single-turn prompt as solving a multi-turn Markov decision process (MDP), where the initial state is the prompt. Inspired by principles in online imitation learning and reinforcement learning, we propose strategies for multi-turn data collection and training so as to imbue an LLM with the capability to recursively detect and correct its previous mistakes in subsequent iterations. Our experiments show that RISE enables Llama2, Llama3, and Mistral models to improve themselves with more turns on math reasoning tasks, outperforming several single-turn strategies given an equal amount of inference-time computation. We also find that RISE scales well, often attaining larger benefits with more capable models. Our analysis shows that RISE makes meaningful improvements to responses to arrive at the correct solution for challenging prompts, without disrupting one-turn abilities as a result of expressing more complex distributions.

翻译：使基础模型具备智能体行为能力的核心要素之一，是让它们能够在获得更多计算资源或交互机会时，对自身行为进行内省、推理并修正错误。即便目前最强大的专有大型语言模型（LLMs）也未能充分展现持续迭代改进其回答的能力——即使在明确被告知存在错误的情况下亦是如此。本文提出RISE（递归自省）方法，通过微调LLMs来引入这种能力，而先前研究曾假设该能力可能无法实现。我们的方法构建了迭代微调流程，旨在教导模型如何在执行先前未成功的尝试后，结合可选的环境反馈，修改其对困难测试问题的回答。RISE将单轮提示的微调问题构建为多轮马尔可夫决策过程（MDP），其中初始状态即为提示。受在线模仿学习和强化学习原理启发，我们提出了多轮数据收集与训练策略，使LLM能够通过递归迭代检测并修正前序错误。实验表明，RISE能使Llama2、Llama3和Mistral模型在数学推理任务中通过多轮迭代实现自我提升，在同等推理计算量下超越多种单轮策略。我们还发现RISE具备良好的扩展性，模型能力越强往往获益越大。分析表明，RISE能对困难提示的回答进行实质性改进以获得正确解，同时不会因表达更复杂的概率分布而损害其单轮回答能力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日