CRAFT-GUI: Curriculum-Reinforced Agent For GUI Tasks

As autonomous agents become adept at understanding and interacting with graphical user interface (GUI) environments, a new era of automated task execution is emerging. Recent studies have demonstrated that Reinforcement Learning (RL) can effectively enhance agents' performance in dynamic interactive GUI environments. However, these methods face two key limitations: (1) they overlook the significant variation in difficulty across different GUI tasks by treating the entire training data as a uniform set, which hampers the agent's ability to adapt its learning process; and (2) most approaches collapse task-specific nuances into a single, coarse reward, leaving the agent with a uniform signal that yields inefficient policy updates. To address these limitations, we propose CRAFT-GUI, a curriculum learning framework based on Group Relative Policy Optimization (GRPO) that explicitly accounts for the varying difficulty across trajectories. To enable more fine-grained policy optimization, we design a reward function that combines simple rule-based signals with model-judged evaluation, providing richer and more nuanced feedback during training. Experimental results demonstrate that our method achieves significant improvements over previous state-of-the-art approaches, outperforming them by 5.6% on public benchmarks Android Control and 10.3% on our internal online benchmarks, respectively. These findings empirically validate the effectiveness of integrating reinforcement learning with curriculum learning in GUI interaction tasks.

翻译：随着自主智能体在理解与交互图形用户界面环境方面日益娴熟，自动化任务执行的新时代正在兴起。近期研究表明，强化学习能有效提升智能体在动态交互式GUI环境中的表现。然而，现有方法存在两大关键局限：（1）将全部训练数据视为均匀集合，忽视了不同GUI任务间显著的难度差异，从而阻碍了智能体对学习过程的适应能力；（2）多数方法将任务特异性细节压缩为单一粗粒度奖励，导致智能体仅获得统一信号，进而产生低效的策略更新。为应对这些局限，我们提出CRAFT-GUI——一种基于群体相对策略优化的课程学习框架，该框架显式地考虑了轨迹间的难度差异。为实现更细粒度的策略优化，我们设计了结合简单规则信号与模型评判评估的奖励函数，在训练过程中提供更丰富、更具差异化的反馈。实验结果表明，我们的方法相较先前最优方法取得显著提升，在公开基准Android Control上超越基线5.6%，在内部在线基准上提升10.3%。这些发现从实证角度验证了强化学习与课程学习在GUI交互任务中融合的有效性。

相关内容

课程

关注 6

课程是指学校学生所应学习的学科总和及其进程与安排。课程是对教育的目标、教学内容、教学活动方式的规划和设计，是教学计划、教学大纲等诸多方面实施过程的总和。广义的课程是指学校为实现培养目标而选择的教育内容及其进程的总和，它包括学校老师所教授的各门学科和有目的、有计划的教育活动。狭义的课程是指某一门学科。专知上对国内外最新AI+X的课程进行了收集与索引，涵盖斯坦福大学、CMU、MIT、清华、北大等名校开放课程。

【牛津博士论文】面向长时程决策任务的高效智能体训练方法

专知会员服务

19+阅读 · 3月27日

构建面向终端的 AI 编程智能体：脚手架、测试环境、上下文工程及实践经验

专知会员服务

22+阅读 · 3月8日

OpenEarthAgent：一种面向工具增强型地理空间智能体的统一框架

专知会员服务

16+阅读 · 2月20日

《针对指挥控制强化学习智能体的对抗攻击》

专知会员服务

31+阅读 · 2月5日