Learning with Challenges: Adaptive Difficulty-Aware Data Generation for Mobile GUI Agent Training

Large-scale, high-quality interaction trajectories are essential for advancing mobile Graphical User Interface (GUI) agents. While existing methods typically rely on labor-intensive human demonstrations or automated model exploration to generate GUI trajectories, they lack fine-grained control over task difficulty. This fundamentally restricts learning effectiveness due to the mismatch between the training difficulty and the agent's capabilities. Inspired by how humans acquire skills through progressively challenging tasks, we propose MobileGen, a novel data generation framework that adaptively aligns training difficulty with the GUI agent's capability frontier. Specifically, MobileGen explicitly decouples task difficulty into structural (e.g., trajectory length) and semantic (e.g., task goal) dimensions. It then iteratively evaluates the agent on a curated prior dataset to construct a systematic profile of its capability frontier across these two dimensions. With this profile, the probability distribution of task difficulty is adaptively computed, from which the target difficulty for the next round of training can be sampled. Guided by the sampled difficulty, a multi-agent controllable generator is finally used to synthesize high-quality interaction trajectories along with corresponding task instructions. Extensive experiments show that MobileGen consistently outperforms existing data generation methods by improving the average performance of GUI agents by 1.57 times across multiple challenging benchmarks. This highlights the importance of capability-aligned data generation for effective mobile GUI agent training.

翻译：大规模、高质量的交互轨迹对于推进移动图形用户界面（GUI）智能体的发展至关重要。现有方法通常依赖于劳动密集型的人工演示或自动化的模型探索来生成GUI轨迹，但缺乏对任务难度的细粒度控制。由于训练难度与智能体能力之间的不匹配，这从根本上限制了学习效果。受人类通过渐进式挑战任务习得技能的启发，我们提出了MobileGen——一种新颖的数据生成框架，它能自适应地将训练难度与GUI智能体的能力边界对齐。具体而言，MobileGen将任务难度明确解耦为结构（例如轨迹长度）和语义（例如任务目标）两个维度。随后，它在一个精心筛选的先验数据集上迭代评估智能体，以构建其在这两个维度上能力边界的系统性画像。基于此画像，自适应地计算任务难度的概率分布，并从中采样出下一轮训练的目标难度。在采样难度的指导下，最终使用一个多智能体可控生成器来合成高质量的交互轨迹及相应的任务指令。大量实验表明，MobileGen始终优于现有的数据生成方法，在多个具有挑战性的基准测试中将GUI智能体的平均性能提升了1.57倍。这突显了能力对齐的数据生成对于有效训练移动GUI智能体的重要性。