Computer-use agents operate over long horizons under noisy perception, multi-window contexts, evolving environment states. Existing approaches, from RL-based planners to trajectory retrieval, often drift from user intent and repeatedly solve routine subproblems, leading to error accumulation and inefficiency. We present IntentCUA, a multi-agent computer-use framework designed to stabilize long-horizon execution through intent-aligned plan memory. A Planner, Plan-Optimizer, and Critic coordinate over shared memory that abstracts raw interaction traces into multi-view intent representations and reusable skills. At runtime, intent prototypes retrieve subgroup-aligned skills and inject them into partial plans, reducing redundant re-planning and mitigating error propagation across desktop applications. In end-to-end evaluations, IntentCUA achieved a 74.83% task success rate with a Step Efficiency Ratio of 0.91, outperforming RL-based and trajectory-centric baselines. Ablations show that multi-view intent abstraction and shared plan memory jointly improve execution stability, with the cooperative multi-agent loop providing the largest gains on long-horizon tasks. These results highlight that system-level intent abstraction and memory-grounded coordination are key to reliable and efficient desktop automation in large, dynamic environments.
翻译:计算机使用智能体需在长时程、感知噪声、多窗口上下文及动态环境状态下运行。现有方法——从基于强化学习的规划器到轨迹检索——常偏离用户意图,并反复求解常规子问题,导致误差累积与效率低下。本文提出IntentCUA,一种基于意图对齐规划记忆的多智能体计算机使用框架,旨在稳定长时程任务执行。该框架通过规划器、规划优化器与评估器在共享记忆上协同工作,将原始交互轨迹抽象为多视角意图表示与可复用技能。在运行时,意图原型检索子目标对齐的技能并将其注入局部规划,从而减少冗余重规划,并抑制跨桌面应用的误差传播。端到端评估表明,IntentCUA实现了74.83%的任务成功率与0.91的步骤效率比,优于基于强化学习及以轨迹为核心的基线方法。消融实验显示,多视角意图抽象与共享规划记忆共同提升了执行稳定性,其中协作式多智能体循环在长时程任务上贡献了最大性能增益。这些结果表明,系统级意图抽象与基于记忆的协同机制是实现大规模动态环境下可靠高效桌面自动化的关键。