Reinforcement Learning for Computer-Use Agents with Autonomous Evaluation - 专知论文

会员服务 ·

0

Agent · Learning · MoDELS · 回合 · 最优化 ·

Reinforcement Learning for Computer-Use Agents with Autonomous Evaluation

翻译：暂无翻译

Marta Sumyk,Oleksandr Kosovan

from arxiv, Accepted to the 4th International Workshop on Generalizing from Limited Resources in the Open World (GLOW @ IJCAI 2026)

Computer-Use Agents (CUAs) execute high-level user goals by perceiving and acting directly within graphical user interfaces. However, reinforcement learning for CUAs remains difficult because open-ended desktop environments rarely provide scalable, machine-readable reward signals: task success is often visually grounded and hard to specify with handcrafted reward functions or dense manual labels. We propose an RL fine-tuning framework that uses autonomous vision-language evaluation as a scalable supervision signal for GUI agents. Given a final screenshot and the original instruction, a Vision-Language Model judges task completion and provides terminal feedback without task-specific heuristics or manual labels during policy optimization. Because autonomous evaluators are imperfect, we model their feedback as a noisy binary reward channel and derive a noise-corrected reward estimator for Proximal Policy Optimization. Experiments across macOSWorld, Windows Agent Arena, and OSWorld show that corrected evaluator rewards outperform both zero-shot baselines and raw evaluator rewards, improving success rates by an average of 12.6 percentage points over zero-shot performance and 5.1 points over raw evaluator fine-tuning. These results suggest that autonomous evaluation can serve as a practical reward signal for RL in GUI environments when evaluator noise is explicitly modeled and corrected.

翻译：暂无翻译

0

相关内容

Agent

2026 年 Agentic AI 工程师完全指南：一份系统化的学习路线图

2026 年 Agentic AI 工程师完全指南：一份系统化的学习路线图

专知会员服务

49+阅读 · 4月14日

最新新Agent综述！76页327篇论文梳理，北交大桑基韬教授团队发布《迈向模型原生智能体式人工智能的范式转变综述》

最新新Agent综述！76页327篇论文梳理，北交大桑基韬教授团队发布《迈向模型原生智能体式人工智能的范式转变综述》

专知会员服务

42+阅读 · 2025年10月17日

Al Agent：AI时代的软件革命

Al Agent：AI时代的软件革命

专知会员服务

48+阅读 · 2025年5月13日

AI行业专题报告：工具生态逐步完善，通用Agent曙光已现

AI行业专题报告：工具生态逐步完善，通用Agent曙光已现

专知会员服务

33+阅读 · 2025年3月27日

工具调用效果比肩GPT-4: 本地可微调的多模型协作工具学习agent框架

工具调用效果比肩GPT-4: 本地可微调的多模型协作工具学习agent框架

专知会员服务

38+阅读 · 2024年2月6日

作战 Agent 的学习算法研究进展与发展趋势

作战 Agent 的学习算法研究进展与发展趋势

专知会员服务

72+阅读 · 2023年10月3日

【Manning新书】自动机器学习实战，Automated Machine Learning in Action

【Manning新书】自动机器学习实战，Automated Machine Learning in Action

专知会员服务

95+阅读 · 2022年4月8日

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

专知会员服务

108+阅读 · 2020年6月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【2020新书】图机器学习，Graph-Powered Machine Learning

【2020新书】图机器学习，Graph-Powered Machine Learning

专知

76+阅读 · 2020年1月27日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

下载 | 866页《计算机视觉：原理，算法，应用，学习》第五版

下载 | 866页《计算机视觉：原理，算法，应用，学习》第五版

机器学习算法与Python学习

24+阅读 · 2019年1月1日

深度学习与计算机视觉任务应用综述

深度学习与计算机视觉任务应用综述

深度学习与NLP

51+阅读 · 2018年12月18日

Hands-on Machine Learning with Scikit-Learn and TensorFlow 学习笔记

Hands-on Machine Learning with Scikit-Learn and TensorFlow 学习笔记

AINLP

12+阅读 · 2018年11月12日

李宏毅-201806-中文-Deep Reinforcement Learning精品课程分享

李宏毅-201806-中文-Deep Reinforcement Learning精品课程分享

深度学习与NLP

15+阅读 · 2018年6月20日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

Github 项目推荐 | 真实全景图像强化学习 AI 平台 —— Matterport3DSimulator

Github 项目推荐 | 真实全景图像强化学习 AI 平台 —— Matterport3DSimulator

AI研习社

10+阅读 · 2018年3月6日

【下载】面向Open AI, TensorFlow, Keras的强化学习书籍《Reinforcement Learning》

【下载】面向Open AI, TensorFlow, Keras的强化学习书籍《Reinforcement Learning》

专知

27+阅读 · 2017年12月17日

面向物联网搜索的群智感知关键技术研究

国家自然科学基金

3+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

基于日常移动平台的用户状态感知与软件协同技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

属性驱动的自适应多agent系统设计关键技术研究

国家自然科学基金

2+阅读 · 2015年12月31日

面向大数据的知识表示、推理、在线学习理论及应用研究

国家自然科学基金

12+阅读 · 2014年12月31日

移动与可穿戴计算中Eyes-Free交互界面研究

国家自然科学基金

0+阅读 · 2014年12月31日

泛在计算环境中社会化驱动的情境感知个性化信息服务研究

国家自然科学基金

2+阅读 · 2014年12月31日

面向人与Agent混合的多团队协作仿真训练方法研究

国家自然科学基金

19+阅读 · 2012年12月31日

基于群体智能的多Agent协作模型与适应性研究

国家自然科学基金

18+阅读 · 2009年12月31日

强化学习关键技术及其在机器人行为学习中的应用

国家自然科学基金

23+阅读 · 2009年12月31日

CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark

Arxiv

0+阅读 · 6月22日

Capable but Careless: Do Computer-Use Agents Follow Contextual Integrity?

Arxiv

0+阅读 · 6月22日

Self-Evolution for Multi-Turn Tool-Calling Agents via Divergence-Point Preference Learning

Arxiv

0+阅读 · 6月22日

MacAgentBench: Benchmarking AI Agents on Real-World macOS Desktop

Arxiv

0+阅读 · 6月21日

Automating SKILL.md Generation for Computer-Using Agents via Interaction Trajectory Mining

Automating SKILL.md Generation for Computer-Using Agents via Interaction Trajectory Mining

Arxiv

0+阅读 · 6月18日

Learning User Simulators with Turing Rewards

Arxiv

0+阅读 · 6月17日

VISUALSKILL: Multimodal Skills for Computer-Use Agents

Arxiv

0+阅读 · 6月16日

SkillJect: Effectively Automating Skill-Based Prompt Injection for Skill-Enabled Agents

Arxiv

0+阅读 · 6月16日

Deep Reinforcement Learning for Multi-Agent Interaction

Arxiv

46+阅读 · 2022年8月2日

Explainable Recommender Systems via Resolving Learning Representations

Arxiv

13+阅读 · 2020年8月21日

VIP会员

文章信息

相关主题

最新内容

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

专知会员服务

2+阅读 · 今天11:43

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

专知会员服务

2+阅读 · 今天11:41

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

专知会员服务

5+阅读 · 今天6:30

网状网络及其在军事领域的运用

网状网络及其在军事领域的运用

专知会员服务

5+阅读 · 今天6:18

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

专知会员服务

6+阅读 · 今天6:08

无美国参与的欧洲战争方式（万字长文）

无美国参与的欧洲战争方式（万字长文）

专知会员服务

6+阅读 · 今天5:54

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

专知会员服务

7+阅读 · 今天5:22

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

专知会员服务

7+阅读 · 今天5:15

《国防领域敏感性分析白皮书》

《国防领域敏感性分析白皮书》

专知会员服务

7+阅读 · 今天3:42

综述 | 从问答到任务完成：Agent系统与Harness设计

综述 | 从问答到任务完成：Agent系统与Harness设计

专知会员服务

5+阅读 · 6月24日

Agentic RL：框架、实践与长程智能体训练

Agentic RL：框架、实践与长程智能体训练

专知会员服务

7+阅读 · 6月24日

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

专知会员服务

10+阅读 · 6月24日

重新思考无人机时代的生存能力

重新思考无人机时代的生存能力

专知会员服务

9+阅读 · 6月24日

装甲突击旅：现代战争思考、战斗与组织

装甲突击旅：现代战争思考、战斗与组织

专知会员服务

7+阅读 · 6月24日

在人工智能加速决策环境中拓展OODA循环

在人工智能加速决策环境中拓展OODA循环

专知会员服务

9+阅读 · 6月24日

相关VIP内容

2026 年 Agentic AI 工程师完全指南：一份系统化的学习路线图

2026 年 Agentic AI 工程师完全指南：一份系统化的学习路线图

专知会员服务

49+阅读 · 4月14日

最新新Agent综述！76页327篇论文梳理，北交大桑基韬教授团队发布《迈向模型原生智能体式人工智能的范式转变综述》

最新新Agent综述！76页327篇论文梳理，北交大桑基韬教授团队发布《迈向模型原生智能体式人工智能的范式转变综述》

专知会员服务

42+阅读 · 2025年10月17日

Al Agent：AI时代的软件革命

Al Agent：AI时代的软件革命

专知会员服务

48+阅读 · 2025年5月13日

AI行业专题报告：工具生态逐步完善，通用Agent曙光已现

AI行业专题报告：工具生态逐步完善，通用Agent曙光已现

专知会员服务

33+阅读 · 2025年3月27日

工具调用效果比肩GPT-4: 本地可微调的多模型协作工具学习agent框架

工具调用效果比肩GPT-4: 本地可微调的多模型协作工具学习agent框架

专知会员服务

38+阅读 · 2024年2月6日

作战 Agent 的学习算法研究进展与发展趋势

作战 Agent 的学习算法研究进展与发展趋势

专知会员服务

72+阅读 · 2023年10月3日

【Manning新书】自动机器学习实战，Automated Machine Learning in Action

【Manning新书】自动机器学习实战，Automated Machine Learning in Action

专知会员服务

95+阅读 · 2022年4月8日

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

专知会员服务

108+阅读 · 2020年6月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

网状网络及其在军事领域的运用

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

相关资讯

【2020新书】图机器学习，Graph-Powered Machine Learning

【2020新书】图机器学习，Graph-Powered Machine Learning

专知

76+阅读 · 2020年1月27日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

下载 | 866页《计算机视觉：原理，算法，应用，学习》第五版

下载 | 866页《计算机视觉：原理，算法，应用，学习》第五版

机器学习算法与Python学习

24+阅读 · 2019年1月1日

深度学习与计算机视觉任务应用综述

深度学习与计算机视觉任务应用综述

深度学习与NLP

51+阅读 · 2018年12月18日

Hands-on Machine Learning with Scikit-Learn and TensorFlow 学习笔记

Hands-on Machine Learning with Scikit-Learn and TensorFlow 学习笔记

AINLP

12+阅读 · 2018年11月12日

李宏毅-201806-中文-Deep Reinforcement Learning精品课程分享

李宏毅-201806-中文-Deep Reinforcement Learning精品课程分享

深度学习与NLP

15+阅读 · 2018年6月20日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

Github 项目推荐 | 真实全景图像强化学习 AI 平台 —— Matterport3DSimulator

Github 项目推荐 | 真实全景图像强化学习 AI 平台 —— Matterport3DSimulator

AI研习社

10+阅读 · 2018年3月6日

【下载】面向Open AI, TensorFlow, Keras的强化学习书籍《Reinforcement Learning》

【下载】面向Open AI, TensorFlow, Keras的强化学习书籍《Reinforcement Learning》

专知

27+阅读 · 2017年12月17日

相关论文

CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark

Arxiv

0+阅读 · 6月22日

Capable but Careless: Do Computer-Use Agents Follow Contextual Integrity?

Arxiv

0+阅读 · 6月22日

Self-Evolution for Multi-Turn Tool-Calling Agents via Divergence-Point Preference Learning

Arxiv

0+阅读 · 6月22日

MacAgentBench: Benchmarking AI Agents on Real-World macOS Desktop

Arxiv

0+阅读 · 6月21日

Automating SKILL.md Generation for Computer-Using Agents via Interaction Trajectory Mining

Automating SKILL.md Generation for Computer-Using Agents via Interaction Trajectory Mining

Arxiv

0+阅读 · 6月18日

Learning User Simulators with Turing Rewards

Arxiv

0+阅读 · 6月17日

VISUALSKILL: Multimodal Skills for Computer-Use Agents

Arxiv

0+阅读 · 6月16日

SkillJect: Effectively Automating Skill-Based Prompt Injection for Skill-Enabled Agents

Arxiv

0+阅读 · 6月16日

Deep Reinforcement Learning for Multi-Agent Interaction

Arxiv

46+阅读 · 2022年8月2日

Explainable Recommender Systems via Resolving Learning Representations

Arxiv

13+阅读 · 2020年8月21日

相关基金

面向物联网搜索的群智感知关键技术研究

国家自然科学基金

3+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

基于日常移动平台的用户状态感知与软件协同技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

属性驱动的自适应多agent系统设计关键技术研究

国家自然科学基金

2+阅读 · 2015年12月31日

面向大数据的知识表示、推理、在线学习理论及应用研究

国家自然科学基金

12+阅读 · 2014年12月31日

移动与可穿戴计算中Eyes-Free交互界面研究

国家自然科学基金

0+阅读 · 2014年12月31日

泛在计算环境中社会化驱动的情境感知个性化信息服务研究

国家自然科学基金

2+阅读 · 2014年12月31日

面向人与Agent混合的多团队协作仿真训练方法研究

国家自然科学基金

19+阅读 · 2012年12月31日

基于群体智能的多Agent协作模型与适应性研究

国家自然科学基金

18+阅读 · 2009年12月31日

强化学习关键技术及其在机器人行为学习中的应用

国家自然科学基金

23+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员