How Much LLM Does a Self-Revising Agent Actually Need? - 专知论文

会员服务 ·

0

How Much LLM Does a Self-Revising Agent Actually Need?

翻译：暂无翻译

Seongwoo Jeong,Seonil Son

from arxiv, WIP

Recent LLM-based agents often place world modeling, planning, and reflection inside a single language model loop. This can produce capable behavior, but it makes a basic scientific question difficult to answer: which part of the agent's competence actually comes from the LLM, and which part comes from explicit structure around it? We study this question not by claiming a general answer, but by making it empirically tractable. We introduce a declared reflective runtime protocol that externalizes agent state, confidence signals, guarded actions, and hypothetical transitions into inspectable runtime structure. We instantiate this protocol in a declarative runtime and evaluate it on noisy Collaborative Battleship [4] using four progressively structured agents over 54 games (18 boards $\times$ 3 seeds). The resulting decomposition isolates four components: posterior belief tracking, explicit world-model planning, symbolic in-episode reflection, and sparse LLM-based revision. Across this decomposition, explicit world-model planning improves substantially over a greedy posterior-following baseline (+24.1pp win rate, +0.017 F1). Symbolic reflection operates as a real runtime mechanism -- with prediction tracking, confidence gating, and guarded revision actions -- even though its current revision presets are not yet net-positive in aggregate. Adding conditional LLM revision at about 4.3\% of turns yields only a small and non-monotonic change: average F1 rises slightly (+0.005) while win rate drops (31$\rightarrow$29 out of 54). These results suggest a methodological contribution rather than a leaderboard claim: externalizing reflection turns otherwise latent agent behavior into inspectable runtime structure, allowing the marginal role of LLM intervention to be studied directly.

翻译：暂无翻译

0

相关内容

从静态模板到动态运行时图：大语言模型智能体（LLM Agents）工作流优化综述

从静态模板到动态运行时图：大语言模型智能体（LLM Agents）工作流优化综述

专知会员服务

22+阅读 · 3月30日

最新新Agent综述！76页327篇论文梳理，北交大桑基韬教授团队发布《迈向模型原生智能体式人工智能的范式转变综述》

最新新Agent综述！76页327篇论文梳理，北交大桑基韬教授团队发布《迈向模型原生智能体式人工智能的范式转变综述》

专知会员服务

40+阅读 · 2025年10月17日

LLM/智能体作为数据分析师：综述

LLM/智能体作为数据分析师：综述

专知会员服务

38+阅读 · 2025年9月30日

从自我进化视角出发，全面解析LLM的推理能力技术演进路径

从自我进化视角出发，全面解析LLM的推理能力技术演进路径

专知会员服务

22+阅读 · 2025年3月6日

【ICLR2025】LLMS能否识别您的偏好？评估LLMS中的个性化偏好遵循能力

【ICLR2025】LLMS能否识别您的偏好？评估LLMS中的个性化偏好遵循能力

专知会员服务

14+阅读 · 2025年2月14日

【新书】构建LLM应用：使用大型语言模型创建智能应用和代理，312页pdf

【新书】构建LLM应用：使用大型语言模型创建智能应用和代理，312页pdf

专知会员服务

100+阅读 · 2024年6月15日

KG-Agent：面向KG复杂推理的高效自治代理框架

KG-Agent：面向KG复杂推理的高效自治代理框架

专知会员服务

35+阅读 · 2024年6月1日

Al Agent--大模型时代重要落地方向

Al Agent--大模型时代重要落地方向

专知会员服务

106+阅读 · 2024年4月8日

AI Agent，大模型时代重要落地方向, 42页ppt

AI Agent，大模型时代重要落地方向, 42页ppt

专知会员服务

290+阅读 · 2023年10月12日

AI Agent下一个热点？复旦最新86页《大型语言模型智能体的崛起与潜力》综述，详述LLM Agent: 大脑、感知和行动

AI Agent下一个热点？复旦最新86页《大型语言模型智能体的崛起与潜力》综述，详述LLM Agent: 大脑、感知和行动

专知会员服务

170+阅读 · 2023年9月15日

解读自监督学习(Self-Supervised Learning)几篇相关paper

解读自监督学习(Self-Supervised Learning)几篇相关paper

CVer

25+阅读 · 2020年2月21日

一文读懂自注意力机制：8大步骤图解+代码

一文读懂自注意力机制：8大步骤图解+代码

新智元

153+阅读 · 2019年11月26日

绝对干货！NLP预训练模型：从transformer到albert

绝对干货！NLP预训练模型：从transformer到albert

新智元

13+阅读 · 2019年11月10日

最新必读【预训练语言模型(BERT/XLNet等)】论文，Google/微软/华为ICLR2020提交论文

最新必读【预训练语言模型(BERT/XLNet等)】论文，Google/微软/华为ICLR2020提交论文

专知

36+阅读 · 2019年9月29日

赛尔原创 | EMNLP 2019 基于上下文感知的变分自编码器建模事件背景知识进行If-Then类型常识推理

赛尔原创 | EMNLP 2019 基于上下文感知的变分自编码器建模事件背景知识进行If-Then类型常识推理

哈工大SCIR

17+阅读 · 2019年9月23日

百度提出ERNIE，多项中文NLP任务表现出色（已开源）

百度提出ERNIE，多项中文NLP任务表现出色（已开源）

AI100

33+阅读 · 2019年3月16日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

【泡泡图灵智库】密集相关的自监督视觉描述学习（RAL）

【泡泡图灵智库】密集相关的自监督视觉描述学习（RAL）

泡泡机器人SLAM

11+阅读 · 2018年10月6日

一文读懂「Attention is All You Need」| 附代码实现

一文读懂「Attention is All You Need」| 附代码实现

PaperWeekly

37+阅读 · 2018年1月10日

自然语言处理（二）机器翻译篇 (NLP: machine translation)

自然语言处理（二）机器翻译篇 (NLP: machine translation)

DeepLearning中文论坛

12+阅读 · 2015年7月1日

自相似序列的无理指数、分形及相关问题

国家自然科学基金

0+阅读 · 2015年12月31日

基于犹豫模糊语言信息的定性决策理论与方法

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

属性驱动的自适应多agent系统设计关键技术研究

国家自然科学基金

2+阅读 · 2015年12月31日

移动与可穿戴计算中Eyes-Free交互界面研究

国家自然科学基金

0+阅读 · 2014年12月31日

物联网关键技术RFID系统安全测试的仿真架构.评估模型和受攻击模式的研究和实践

国家自然科学基金

2+阅读 · 2014年12月31日

随机Helmholtz型问题的数值方法

国家自然科学基金

0+阅读 · 2014年12月31日

可重构的环境自适应RS码软判决译码器研究

国家自然科学基金

0+阅读 · 2014年12月31日

面向人与Agent混合的多团队协作仿真训练方法研究

国家自然科学基金

19+阅读 · 2012年12月31日

基于群体智能的多Agent协作模型与适应性研究

国家自然科学基金

18+阅读 · 2009年12月31日

SoK: Security of Autonomous LLM Agents in Agentic Commerce

Arxiv

0+阅读 · 5月1日

Survey on Evaluation of LLM-based Agents

Arxiv

0+阅读 · 4月23日

AgenTEE: Confidential LLM Agent Execution on Edge Devices

Arxiv

0+阅读 · 4月20日

Coalition Formation in LLM Agent Networks: Stability Analysis and Convergence Guarantees

Arxiv

0+阅读 · 4月15日

Competition and Cooperation of LLM Agents in Games

Arxiv

0+阅读 · 4月11日

Quine: Realizing LLM Agents as Native POSIX Processes

Arxiv

0+阅读 · 4月9日

Do Agents Repair When Challenged -- or Just Reply? Challenge, Repair, and Public Correction in a Deployed Agent Forum

Arxiv

0+阅读 · 4月1日

When Can We Trust LLM Graders? Calibrating Confidence for Automated Assessment

Arxiv

0+阅读 · 3月31日

Agent Audit: A Security Analysis System for LLM Agent Applications

Arxiv

0+阅读 · 3月24日

Is Your LLM-as-a-Recommender Agent Trustable? LLMs' Recommendation is Easily Hacked by Biases (Preferences)

Arxiv

0+阅读 · 3月20日

VIP会员

文章信息

相关主题

最新内容

DeepSeek 版Claude Code，免费小白安装教程来了！

DeepSeek 版Claude Code，免费小白安装教程来了！

专知会员服务

1+阅读 · 今天16:16

【ICML Spotlight 2026】 T²PO: 不确定性引导的探索控制框架，实现稳定多轮Agentic强化学习

【ICML Spotlight 2026】 T²PO: 不确定性引导的探索控制框架，实现稳定多轮Agentic强化学习

专知会员服务

1+阅读 · 今天16:08

基础模型驱动的工业智能体：技术成熟度、能力变迁与未竟之挑战

基础模型驱动的工业智能体：技术成熟度、能力变迁与未竟之挑战

专知会员服务

0+阅读 · 今天16:08

《机动炮兵的演进与未来：技术进步、历史沿革与炮兵作战前瞻》

《机动炮兵的演进与未来：技术进步、历史沿革与炮兵作战前瞻》

专知会员服务

2+阅读 · 今天14:09

《火炮弹药快速效能建模：提升互操作性与技术优势》（报告）

《火炮弹药快速效能建模：提升互操作性与技术优势》（报告）

专知会员服务

4+阅读 · 今天14:04

《美空军条令出版物 2-0：情报（2026版）》

《美空军条令出版物 2-0：情报（2026版）》

专知会员服务

7+阅读 · 今天13:54

美陆军“飞蝇陷阱5.0”项目将新兴技术交到作战人员手中

美陆军“飞蝇陷阱5.0”项目将新兴技术交到作战人员手中

专知会员服务

3+阅读 · 今天13:46

帕兰提尔 Gotham：一个游戏规则改变器

帕兰提尔 Gotham：一个游戏规则改变器

专知会员服务

5+阅读 · 今天13:34

【ICML 2026】用测试时训练线性化视觉Transformer：T⁵ 实现 Softmax 注意力到线性复杂度的快速转换

【ICML 2026】用测试时训练线性化视觉Transformer：T⁵ 实现 Softmax 注意力到线性复杂度的快速转换

专知会员服务

2+阅读 · 今天13:02

【AAAI 2026】大模型做知识蒸馏：CMM将LLM特征拆解给小模型协同学习

【AAAI 2026】大模型做知识蒸馏：CMM将LLM特征拆解给小模型协同学习

专知会员服务

2+阅读 · 今天12:07

【ICML Spotlight 2026 】NonZero：交互引导探索的多智能体蒙特卡洛树搜索

【ICML Spotlight 2026 】NonZero：交互引导探索的多智能体蒙特卡洛树搜索

专知会员服务

8+阅读 · 5月4日

【综述】机器人学习中的世界模型：全面综述

【综述】机器人学习中的世界模型：全面综述

专知会员服务

10+阅读 · 5月4日

伊朗的导弹-无人机行动及其对美国威慑的影响

伊朗的导弹-无人机行动及其对美国威慑的影响

专知会员服务

8+阅读 · 5月4日

《未来战术无人机系统案例研究：量身定制采办策略方法》100页报告

《未来战术无人机系统案例研究：量身定制采办策略方法》100页报告

专知会员服务

8+阅读 · 5月4日

战争贩子：2026年第一季度美国对中东潜在军售激增

战争贩子：2026年第一季度美国对中东潜在军售激增

专知会员服务

6+阅读 · 5月4日

相关VIP内容

从静态模板到动态运行时图：大语言模型智能体（LLM Agents）工作流优化综述

从静态模板到动态运行时图：大语言模型智能体（LLM Agents）工作流优化综述

专知会员服务

22+阅读 · 3月30日

最新新Agent综述！76页327篇论文梳理，北交大桑基韬教授团队发布《迈向模型原生智能体式人工智能的范式转变综述》

最新新Agent综述！76页327篇论文梳理，北交大桑基韬教授团队发布《迈向模型原生智能体式人工智能的范式转变综述》

专知会员服务

40+阅读 · 2025年10月17日

LLM/智能体作为数据分析师：综述

LLM/智能体作为数据分析师：综述

专知会员服务

38+阅读 · 2025年9月30日

从自我进化视角出发，全面解析LLM的推理能力技术演进路径

从自我进化视角出发，全面解析LLM的推理能力技术演进路径

专知会员服务

22+阅读 · 2025年3月6日

【ICLR2025】LLMS能否识别您的偏好？评估LLMS中的个性化偏好遵循能力

【ICLR2025】LLMS能否识别您的偏好？评估LLMS中的个性化偏好遵循能力

专知会员服务

14+阅读 · 2025年2月14日

【新书】构建LLM应用：使用大型语言模型创建智能应用和代理，312页pdf

【新书】构建LLM应用：使用大型语言模型创建智能应用和代理，312页pdf

专知会员服务

100+阅读 · 2024年6月15日

KG-Agent：面向KG复杂推理的高效自治代理框架

KG-Agent：面向KG复杂推理的高效自治代理框架

专知会员服务

35+阅读 · 2024年6月1日

Al Agent--大模型时代重要落地方向

Al Agent--大模型时代重要落地方向

专知会员服务

106+阅读 · 2024年4月8日

AI Agent，大模型时代重要落地方向, 42页ppt

AI Agent，大模型时代重要落地方向, 42页ppt

专知会员服务

290+阅读 · 2023年10月12日

AI Agent下一个热点？复旦最新86页《大型语言模型智能体的崛起与潜力》综述，详述LLM Agent: 大脑、感知和行动

AI Agent下一个热点？复旦最新86页《大型语言模型智能体的崛起与潜力》综述，详述LLM Agent: 大脑、感知和行动

专知会员服务

170+阅读 · 2023年9月15日

热门VIP内容

开通专知VIP会员享更多权益服务

【ICML Spotlight 2026】 T²PO: 不确定性引导的探索控制框架，实现稳定多轮Agentic强化学习

《机动炮兵的演进与未来：技术进步、历史沿革与炮兵作战前瞻》

DeepSeek 版Claude Code，免费小白安装教程来了！

基础模型驱动的工业智能体：技术成熟度、能力变迁与未竟之挑战

相关资讯

解读自监督学习(Self-Supervised Learning)几篇相关paper

解读自监督学习(Self-Supervised Learning)几篇相关paper

CVer

25+阅读 · 2020年2月21日

一文读懂自注意力机制：8大步骤图解+代码

一文读懂自注意力机制：8大步骤图解+代码

新智元

153+阅读 · 2019年11月26日

绝对干货！NLP预训练模型：从transformer到albert

绝对干货！NLP预训练模型：从transformer到albert

新智元

13+阅读 · 2019年11月10日

最新必读【预训练语言模型(BERT/XLNet等)】论文，Google/微软/华为ICLR2020提交论文

最新必读【预训练语言模型(BERT/XLNet等)】论文，Google/微软/华为ICLR2020提交论文

专知

36+阅读 · 2019年9月29日

赛尔原创 | EMNLP 2019 基于上下文感知的变分自编码器建模事件背景知识进行If-Then类型常识推理

赛尔原创 | EMNLP 2019 基于上下文感知的变分自编码器建模事件背景知识进行If-Then类型常识推理

哈工大SCIR

17+阅读 · 2019年9月23日

百度提出ERNIE，多项中文NLP任务表现出色（已开源）

百度提出ERNIE，多项中文NLP任务表现出色（已开源）

AI100

33+阅读 · 2019年3月16日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

【泡泡图灵智库】密集相关的自监督视觉描述学习（RAL）

【泡泡图灵智库】密集相关的自监督视觉描述学习（RAL）

泡泡机器人SLAM

11+阅读 · 2018年10月6日

一文读懂「Attention is All You Need」| 附代码实现

一文读懂「Attention is All You Need」| 附代码实现

PaperWeekly

37+阅读 · 2018年1月10日

自然语言处理（二）机器翻译篇 (NLP: machine translation)

自然语言处理（二）机器翻译篇 (NLP: machine translation)

DeepLearning中文论坛

12+阅读 · 2015年7月1日

相关论文

SoK: Security of Autonomous LLM Agents in Agentic Commerce

Arxiv

0+阅读 · 5月1日

Survey on Evaluation of LLM-based Agents

Arxiv

0+阅读 · 4月23日

AgenTEE: Confidential LLM Agent Execution on Edge Devices

Arxiv

0+阅读 · 4月20日

Coalition Formation in LLM Agent Networks: Stability Analysis and Convergence Guarantees

Arxiv

0+阅读 · 4月15日

Competition and Cooperation of LLM Agents in Games

Arxiv

0+阅读 · 4月11日

Quine: Realizing LLM Agents as Native POSIX Processes

Arxiv

0+阅读 · 4月9日

Do Agents Repair When Challenged -- or Just Reply? Challenge, Repair, and Public Correction in a Deployed Agent Forum

Arxiv

0+阅读 · 4月1日

When Can We Trust LLM Graders? Calibrating Confidence for Automated Assessment

Arxiv

0+阅读 · 3月31日

Agent Audit: A Security Analysis System for LLM Agent Applications

Arxiv

0+阅读 · 3月24日

Is Your LLM-as-a-Recommender Agent Trustable? LLMs' Recommendation is Easily Hacked by Biases (Preferences)

Arxiv

0+阅读 · 3月20日

相关基金

自相似序列的无理指数、分形及相关问题

国家自然科学基金

0+阅读 · 2015年12月31日

基于犹豫模糊语言信息的定性决策理论与方法

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

属性驱动的自适应多agent系统设计关键技术研究

国家自然科学基金

2+阅读 · 2015年12月31日

移动与可穿戴计算中Eyes-Free交互界面研究

国家自然科学基金

0+阅读 · 2014年12月31日

物联网关键技术RFID系统安全测试的仿真架构.评估模型和受攻击模式的研究和实践

国家自然科学基金

2+阅读 · 2014年12月31日

随机Helmholtz型问题的数值方法

国家自然科学基金

0+阅读 · 2014年12月31日

可重构的环境自适应RS码软判决译码器研究

国家自然科学基金

0+阅读 · 2014年12月31日

面向人与Agent混合的多团队协作仿真训练方法研究

国家自然科学基金

19+阅读 · 2012年12月31日

基于群体智能的多Agent协作模型与适应性研究

国家自然科学基金

18+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员