ColorBrowserAgent：面向复杂长程网页自动化的智能GUI代理 (ColorBrowserAgent: An Intelligent GUI Agent for Complex Long-Horizon Web Automation)

The web browser serves as a primary interface for daily human activities, making its automation a critical frontier for Human-Centred AI. While Large Language Models (LLMs) have enabled autonomous agents to interact with web GUIs, their reliability in real-world scenarios is hampered by long-horizon instability and the vast heterogeneity of site designs. In this paper, we introduce ColorBrowserAgent, a framework designed for Collaborative Autonomy in complex web tasks. Our approach integrates two human-centred mechanisms: (1) Progressive Progress Summarization, which mimics human short-term memory to maintain coherence over extended interactions; and (2) Human-in-the-Loop Knowledge Adaptation, which bridges the knowledge gap in diverse environments by soliciting expert intervention only when necessary. This symbiotic design allows the agent to learn from human tips without extensive retraining, effectively combining the scalability of AI with the adaptability of human cognition. Evaluated on the WebArena benchmark using GPT-5, ColorBrowserAgent achieves a state-of-the-art success rate of 71.2\%, demonstrating the efficacy of interactive human assistance in robust web automation.

翻译：网页浏览器作为人类日常活动的主要界面，使其自动化成为人本人工智能的关键前沿。尽管大型语言模型（LLLMs）已使自主代理能够与网页图形用户界面交互，但在实际场景中，其可靠性受到长程不稳定性与网站设计高度异质性的制约。本文提出ColorBrowserAgent，一个为复杂网页任务设计的协同自主性框架。该方法整合了两种人本机制：（1）渐进式进展摘要，通过模拟人类短期记忆以维持长程交互的连贯性；（2）人机协同知识适配，仅在必要时引入专家干预以弥合多样化环境中的知识鸿沟。这种共生设计使代理能够从人类提示中学习而无需大量重新训练，有效结合了人工智能的可扩展性与人类认知的适应性。在WebArena基准测试中使用GPT-5进行评估，ColorBrowserAgent实现了71.2%的最优成功率，证明了交互式人工辅助在鲁棒网页自动化中的有效性。