As autonomous agents become adept at understanding and interacting with graphical user interface (GUI) environments, a new era of automated task execution is emerging. Recent studies have demonstrated that Reinforcement Learning (RL) can effectively enhance agents' performance in dynamic interactive GUI environments. However, these methods face two key limitations: (1) they overlook the significant variation in difficulty across different GUI tasks by treating the entire training data as a uniform set, which hampers the agent's ability to adapt its learning process; and (2) most approaches collapse task-specific nuances into a single, coarse reward, leaving the agent with a uniform signal that yields inefficient policy updates. To address these limitations, we propose CRAFT-GUI, a curriculum learning framework based on Group Relative Policy Optimization (GRPO) that explicitly accounts for the varying difficulty across trajectories. To enable more fine-grained policy optimization, we design a reward function that combines simple rule-based signals with model-judged evaluation, providing richer and more nuanced feedback during training. Experimental results demonstrate that our method achieves significant improvements over previous state-of-the-art approaches, outperforming them by 5.6% on public benchmarks Android Control and 10.3% on our internal online benchmarks, respectively. These findings empirically validate the effectiveness of integrating reinforcement learning with curriculum learning in GUI interaction tasks.
翻译:随着自主智能体在理解与交互图形用户界面环境方面日益娴熟,自动化任务执行的新时代正在兴起。近期研究表明,强化学习能有效提升智能体在动态交互式GUI环境中的表现。然而,现有方法存在两大关键局限:(1)将全部训练数据视为均匀集合,忽视了不同GUI任务间显著的难度差异,从而阻碍了智能体对学习过程的适应能力;(2)多数方法将任务特异性细节压缩为单一粗粒度奖励,导致智能体仅获得统一信号,进而产生低效的策略更新。为应对这些局限,我们提出CRAFT-GUI——一种基于群体相对策略优化的课程学习框架,该框架显式地考虑了轨迹间的难度差异。为实现更细粒度的策略优化,我们设计了结合简单规则信号与模型评判评估的奖励函数,在训练过程中提供更丰富、更具差异化的反馈。实验结果表明,我们的方法相较先前最优方法取得显著提升,在公开基准Android Control上超越基线5.6%,在内部在线基准上提升10.3%。这些发现从实证角度验证了强化学习与课程学习在GUI交互任务中融合的有效性。