Online Reinforcement Learning (RL) offers a promising paradigm for enhancing GUI agents through direct environment interaction. However, its effectiveness is severely hindered by inefficient credit assignment in long-horizon tasks and repetitive errors across tasks due to the lack of experience transfer. To address these challenges, we propose UI-Mem, a novel framework that enhances GUI online RL with a Hierarchical Experience Memory. Unlike traditional replay buffers, our memory accumulates structured knowledge, including high-level workflows, subtask skills, and failure patterns. These experiences are stored as parameterized templates that enable cross-task and cross-application transfer. To effectively integrate memory guidance into online RL, we introduce Stratified Group Sampling, which injects varying levels of guidance across trajectories within each rollout group to maintain outcome diversity, driving the unguided policy toward internalizing guided behaviors. Furthermore, a Self-Evolving Loop continuously abstracts novel strategies and errors to keep the memory aligned with the agent's evolving policy. Experiments on online GUI benchmarks demonstrate that UI-Mem significantly outperforms traditional RL baselines and static reuse strategies, with strong generalization to unseen applications. Project page: https://ui-mem.github.io
翻译:在线强化学习(Online Reinforcement Learning, RL)通过直接与环境交互,为增强GUI智能体提供了一种前景广阔的范式。然而,其在长视野任务中低效的信用分配,以及由于缺乏经验迁移而导致的跨任务重复错误,严重制约了其效能。为应对这些挑战,我们提出了UI-Mem,一个通过分层经验记忆增强GUI在线强化学习的新型框架。与传统经验回放缓冲区不同,我们的记忆积累结构化的知识,包括高层工作流、子任务技能和失败模式。这些经验以参数化模板的形式存储,支持跨任务和跨应用的迁移。为了将记忆指导有效地整合到在线强化学习中,我们引入了分层组采样(Stratified Group Sampling)。该方法在每个训练批次组内的轨迹中注入不同层级的指导,以保持结果多样性,从而驱动无指导策略内化受指导的行为。此外,一个自演进循环持续抽象出新的策略和错误,使记忆与智能体不断演进的策略保持一致。在在线GUI基准测试上的实验表明,UI-Mem显著优于传统的强化学习基线方法和静态重用策略,并对未见过的应用展现出强大的泛化能力。项目页面:https://ui-mem.github.io