INFUSER: Influence-Guided Self-Evolution Improves Reasoning - 专知论文

会员服务 ·

0

汇聚 · 得分 · 路径 · 语言模型化 · MoDELS ·

INFUSER: Influence-Guided Self-Evolution Improves Reasoning

翻译：暂无翻译

Siyu Chen,Miao Lu,Beining Wu,Heejune Sheen,Fengzhuo Zhang,Shuangning Li,Zhiyuan Li,Jose Blanchet,Tianhao Wang,Zhuoran Yang

from arxiv, 66 pages, 17 figures

Self-evolution offers a scalable path to stronger reasoning: a pretrained language model improves itself with only minimal external supervision. Yet existing methods either depend on extensively curated or teacher-generated training data, or, when the generator runs unsupervised, reward it by a difficulty heuristic that need not improve the solver. We introduce INFUSER, an iterative co-training framework with two co-evolving roles: a Generator that drafts questions and reference golden answers from a pool of unstructured, automatically collected documents, and a Solver that improves by training on them. The solver is trained with standard correctness rewards against the generator-provided answers, while the generator is rewarded by an optimizer-aware influence score that measures whether each proposed question would actually improve the solver on the target distribution. Because this continuous, noisy influence score is poorly served by standard GRPO, we propose DuGRPO, a dual-normalized variant of GRPO, for generator training. Together, these turn the document pool into an adaptive curriculum that favors questions useful to the current solver, not just hard ones. On Qwen3-8B-Base, INFUSER outperforms strong self-evolution baselines with over 20% relative improvement on Olympiad and SuperGPQA benchmarks, and an 8B INFUSER co-evolving generator outperforms a frozen 32B thinking generator on math and coding. Ablations confirm each design choice is necessary, and two extensions, applying INFUSER to an instruction-finetuned anchor and augmenting it with rule-verifiable RLVR data, further demonstrate the flexibility and generalizability of the framework. Code is available at https://github.com/FFishy-git/INFUSER.

翻译：暂无翻译

0

相关内容

BES：让语言模型通过双向进化搜索自我改进

BES：让语言模型通过双向进化搜索自我改进

专知会员服务

8+阅读 · 5月30日

多模态大语言模型的自我改进：综述

多模态大语言模型的自我改进：综述

专知会员服务

28+阅读 · 2025年10月8日

从自我进化视角出发，全面解析LLM的推理能力技术演进路径

从自我进化视角出发，全面解析LLM的推理能力技术演进路径

专知会员服务

14+阅读 · 2025年3月6日

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

专知会员服务

15+阅读 · 2022年3月12日

【ICML2021】具有持续进化策略的展开计算图的无偏梯度估计

专知会员服务

11+阅读 · 2021年8月10日

【ICML2021】SparseBERT: 自注意力机制的重要性分析再思考

专知会员服务

37+阅读 · 2021年5月15日

【CVPR2020-中科院计算所】弱监督语义分割的自监督等价注意力机制，Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

【CVPR2020-中科院计算所】弱监督语义分割的自监督等价注意力机制，Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

专知会员服务

76+阅读 · 2020年4月10日

【Google AI论文】无妥协的弱监督解缠，Weakly-Supervised Disentanglement Without Compromises

【Google AI论文】无妥协的弱监督解缠，Weakly-Supervised Disentanglement Without Compromises

专知会员服务

20+阅读 · 2020年2月12日

【ICLR2020 预训练的百科全书】弱监督的知识-预训练的语言模型（PRETRAINED ENCYCLOPEDIA: WEAKLY SUPERVISED KNOWLEDGE-PRETRAINED LANGUAGE MODEL）

【ICLR2020 预训练的百科全书】弱监督的知识-预训练的语言模型（PRETRAINED ENCYCLOPEDIA: WEAKLY SUPERVISED KNOWLEDGE-PRETRAINED LANGUAGE MODEL）

专知会员服务

25+阅读 · 2019年12月26日

Influence Maximization: Integrating and Expanding Classical Algorithms into the Social Network Context [陈卫微软亚洲研究院] 2019年中国计算机大会机器学习与数据挖掘论坛

Influence Maximization: Integrating and Expanding Classical Algorithms into the Social Network Context [陈卫微软亚洲研究院] 2019年中国计算机大会机器学习与数据挖掘论坛

专知会员服务

10+阅读 · 2019年10月26日

NLP大牛Thomas Wolf等新书《Transformer自然语言处理》，466页pdf及代码

NLP大牛Thomas Wolf等新书《Transformer自然语言处理》，466页pdf及代码

专知

36+阅读 · 2022年2月7日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

133+阅读 · 2020年3月18日

解读自监督学习(Self-Supervised Learning)几篇相关paper

解读自监督学习(Self-Supervised Learning)几篇相关paper

CVer

25+阅读 · 2020年2月21日

【加州理工】什么是模仿学习(Imitation Learning（模仿学习), 这62页ppt带你了解进展，附下载

【加州理工】什么是模仿学习(Imitation Learning（模仿学习), 这62页ppt带你了解进展，附下载

专知

21+阅读 · 2019年11月14日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

自定义损失函数Gradient Boosting

自定义损失函数Gradient Boosting

AI研习社

14+阅读 · 2018年10月16日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

【论文笔记】自注意力机制学习句子embedding

【论文笔记】自注意力机制学习句子embedding

专知

15+阅读 · 2018年5月17日

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

专知

12+阅读 · 2018年5月9日

人类转录因子基因家族调控网络进化模式研究

国家自然科学基金

0+阅读 · 2015年12月31日

自相似序列的无理指数、分形及相关问题

国家自然科学基金

0+阅读 · 2015年12月31日

基于非独立同分布学习理论的图模型词义消歧及领域适应方法研究

国家自然科学基金

1+阅读 · 2015年12月31日

个性化特征大数据支持下的交互式进化计算及其应用

国家自然科学基金

1+阅读 · 2015年12月31日

强调与对比影响语篇理解的认知过程及其神经机制

国家自然科学基金

4+阅读 · 2015年12月31日

可与MPSoC高度融合的片上自主测试-自主修复关键技术研究：针对自然、人为可靠性威胁

国家自然科学基金

0+阅读 · 2015年12月31日

进化算法行为分析及应用

国家自然科学基金

1+阅读 · 2015年12月31日

中文句子语义概念图自动构建方法及应用研究

国家自然科学基金

3+阅读 · 2014年12月31日

半监督进化文本聚类算法在动态多源文本分析上的研究

国家自然科学基金

2+阅读 · 2014年12月31日

Forward-Looking与Backward-Looking相结合的投资组合管理

国家自然科学基金

1+阅读 · 2014年12月31日

Escaping the Self-Confirmation Trap: An Execute-Distill-Verify Paradigm for Agentic Experience Learning

Arxiv

0+阅读 · 6月23日

Improving Reasoning in Vision-Language Models via Perception Verified Self-Training

Arxiv

0+阅读 · 6月20日

MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis

Arxiv

0+阅读 · 6月19日

Rethinking Reward Supervision: Rubric-Conditioned Self-Distillation

Arxiv

0+阅读 · 6月17日

Diffusion-Proof: Recipe for Formal Theorem Proving Beyond Auto-Regressive Generation

Arxiv

0+阅读 · 6月17日

Learning from Your Own Mistakes: Constructing Learnable Micro-Reflective Trajectories for Self-Distillation

Arxiv

0+阅读 · 6月17日

PragReST: Self-Reinforcing Counterfactual Reasoning for Pragmatic Language Understanding

Arxiv

0+阅读 · 6月17日

Self-CTRL: Self-Consistency Training with Reinforcement Learning

Arxiv

0+阅读 · 6月16日

INFUSER: Influence-Guided Self-Evolution Improves Reasoning

Arxiv

0+阅读 · 6月8日

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation

Arxiv

13+阅读 · 2020年7月3日

VIP会员

文章信息

相关主题

语言模型化

最新内容

无人机自主控制与人工智能：系统性综述

无人机自主控制与人工智能：系统性综述

专知会员服务

8+阅读 · 今天7:25

巡飞弹与反无人机系统——现代战场的两大支柱

巡飞弹与反无人机系统——现代战场的两大支柱

专知会员服务

3+阅读 · 今天6:54

《打造“黄金舰队”》57页报告

《打造“黄金舰队”》57页报告

专知会员服务

2+阅读 · 今天6:52

《北约数字教官网络发展路径》128页报告

《北约数字教官网络发展路径》128页报告

专知会员服务

2+阅读 · 今天6:33

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

专知会员服务

7+阅读 · 6月25日

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

专知会员服务

6+阅读 · 6月25日

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

专知会员服务

9+阅读 · 6月25日

网状网络及其在军事领域的运用

网状网络及其在军事领域的运用

专知会员服务

7+阅读 · 6月25日

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

专知会员服务

8+阅读 · 6月25日

无美国参与的欧洲战争方式（万字长文）

无美国参与的欧洲战争方式（万字长文）

专知会员服务

8+阅读 · 6月25日

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

专知会员服务

10+阅读 · 6月25日

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

专知会员服务

9+阅读 · 6月25日

《国防领域敏感性分析白皮书》

《国防领域敏感性分析白皮书》

专知会员服务

9+阅读 · 6月25日

综述 | 从问答到任务完成：Agent系统与Harness设计

综述 | 从问答到任务完成：Agent系统与Harness设计

专知会员服务

10+阅读 · 6月24日

Agentic RL：框架、实践与长程智能体训练

Agentic RL：框架、实践与长程智能体训练

专知会员服务

10+阅读 · 6月24日

相关VIP内容

BES：让语言模型通过双向进化搜索自我改进

BES：让语言模型通过双向进化搜索自我改进

专知会员服务

8+阅读 · 5月30日

多模态大语言模型的自我改进：综述

多模态大语言模型的自我改进：综述

专知会员服务

28+阅读 · 2025年10月8日

从自我进化视角出发，全面解析LLM的推理能力技术演进路径

从自我进化视角出发，全面解析LLM的推理能力技术演进路径

专知会员服务

14+阅读 · 2025年3月6日

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

专知会员服务

15+阅读 · 2022年3月12日

【ICML2021】具有持续进化策略的展开计算图的无偏梯度估计

专知会员服务

11+阅读 · 2021年8月10日

【ICML2021】SparseBERT: 自注意力机制的重要性分析再思考

专知会员服务

37+阅读 · 2021年5月15日

【CVPR2020-中科院计算所】弱监督语义分割的自监督等价注意力机制，Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

【CVPR2020-中科院计算所】弱监督语义分割的自监督等价注意力机制，Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

专知会员服务

76+阅读 · 2020年4月10日

【Google AI论文】无妥协的弱监督解缠，Weakly-Supervised Disentanglement Without Compromises

【Google AI论文】无妥协的弱监督解缠，Weakly-Supervised Disentanglement Without Compromises

专知会员服务

20+阅读 · 2020年2月12日

【ICLR2020 预训练的百科全书】弱监督的知识-预训练的语言模型（PRETRAINED ENCYCLOPEDIA: WEAKLY SUPERVISED KNOWLEDGE-PRETRAINED LANGUAGE MODEL）

【ICLR2020 预训练的百科全书】弱监督的知识-预训练的语言模型（PRETRAINED ENCYCLOPEDIA: WEAKLY SUPERVISED KNOWLEDGE-PRETRAINED LANGUAGE MODEL）

专知会员服务

25+阅读 · 2019年12月26日

Influence Maximization: Integrating and Expanding Classical Algorithms into the Social Network Context [陈卫微软亚洲研究院] 2019年中国计算机大会机器学习与数据挖掘论坛

Influence Maximization: Integrating and Expanding Classical Algorithms into the Social Network Context [陈卫微软亚洲研究院] 2019年中国计算机大会机器学习与数据挖掘论坛

专知会员服务

10+阅读 · 2019年10月26日

热门VIP内容

开通专知VIP会员享更多权益服务

巡飞弹与反无人机系统——现代战场的两大支柱

《北约数字教官网络发展路径》128页报告

无人机自主控制与人工智能：系统性综述

《打造“黄金舰队”》57页报告

相关资讯

NLP大牛Thomas Wolf等新书《Transformer自然语言处理》，466页pdf及代码

NLP大牛Thomas Wolf等新书《Transformer自然语言处理》，466页pdf及代码

专知

36+阅读 · 2022年2月7日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

133+阅读 · 2020年3月18日

解读自监督学习(Self-Supervised Learning)几篇相关paper

解读自监督学习(Self-Supervised Learning)几篇相关paper

CVer

25+阅读 · 2020年2月21日

【加州理工】什么是模仿学习(Imitation Learning（模仿学习), 这62页ppt带你了解进展，附下载

【加州理工】什么是模仿学习(Imitation Learning（模仿学习), 这62页ppt带你了解进展，附下载

专知

21+阅读 · 2019年11月14日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

自定义损失函数Gradient Boosting

自定义损失函数Gradient Boosting

AI研习社

14+阅读 · 2018年10月16日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

【论文笔记】自注意力机制学习句子embedding

【论文笔记】自注意力机制学习句子embedding

专知

15+阅读 · 2018年5月17日

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

专知

12+阅读 · 2018年5月9日

相关论文

Escaping the Self-Confirmation Trap: An Execute-Distill-Verify Paradigm for Agentic Experience Learning

Arxiv

0+阅读 · 6月23日

Improving Reasoning in Vision-Language Models via Perception Verified Self-Training

Arxiv

0+阅读 · 6月20日

MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis

Arxiv

0+阅读 · 6月19日

Rethinking Reward Supervision: Rubric-Conditioned Self-Distillation

Arxiv

0+阅读 · 6月17日

Diffusion-Proof: Recipe for Formal Theorem Proving Beyond Auto-Regressive Generation

Arxiv

0+阅读 · 6月17日

Learning from Your Own Mistakes: Constructing Learnable Micro-Reflective Trajectories for Self-Distillation

Arxiv

0+阅读 · 6月17日

PragReST: Self-Reinforcing Counterfactual Reasoning for Pragmatic Language Understanding

Arxiv

0+阅读 · 6月17日

Self-CTRL: Self-Consistency Training with Reinforcement Learning

Arxiv

0+阅读 · 6月16日

INFUSER: Influence-Guided Self-Evolution Improves Reasoning

Arxiv

0+阅读 · 6月8日

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation

Arxiv

13+阅读 · 2020年7月3日

相关基金

人类转录因子基因家族调控网络进化模式研究

国家自然科学基金

0+阅读 · 2015年12月31日

自相似序列的无理指数、分形及相关问题

国家自然科学基金

0+阅读 · 2015年12月31日

基于非独立同分布学习理论的图模型词义消歧及领域适应方法研究

国家自然科学基金

1+阅读 · 2015年12月31日

个性化特征大数据支持下的交互式进化计算及其应用

国家自然科学基金

1+阅读 · 2015年12月31日

强调与对比影响语篇理解的认知过程及其神经机制

国家自然科学基金

4+阅读 · 2015年12月31日

可与MPSoC高度融合的片上自主测试-自主修复关键技术研究：针对自然、人为可靠性威胁

国家自然科学基金

0+阅读 · 2015年12月31日

进化算法行为分析及应用

国家自然科学基金

1+阅读 · 2015年12月31日

中文句子语义概念图自动构建方法及应用研究

国家自然科学基金

3+阅读 · 2014年12月31日

半监督进化文本聚类算法在动态多源文本分析上的研究

国家自然科学基金

2+阅读 · 2014年12月31日

Forward-Looking与Backward-Looking相结合的投资组合管理

国家自然科学基金

1+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员