The emergence of AI-driven web automation through Large Language Models (LLMs) offers unprecedented opportunities for optimizing digital workflows. However, deploying such systems within industry's real-world environments presents four core challenges: (1) ensuring consistent execution, (2) accurately identifying critical HTML elements, (3) meeting human-like accuracy in order to automate operations at scale and (4) the lack of comprehensive benchmarking data on internal web applications. Existing solutions are primarily tailored for well-designed, consumer-facing websites (e.g., Amazon.com, Apple.com) and fall short in addressing the complexity of poorly-designed internal web interfaces. To address these limitations, we present Cybernaut, a novel framework to ensure high execution consistency in web automation agents designed for robust enterprise use. Our contributions are threefold: (1) a Standard Operating Procedure (SOP) generator that converts user demonstrations into reliable automation instructions for linear browsing tasks, (2) a high-precision HTML DOM element recognition system tailored for the challenge of complex web interfaces, and (3) a quantitative metric to assess execution consistency. The empirical evaluation on our internal benchmark demonstrates that using our framework enables a 23.2% improvement (from 72% to 88.68%) in task execution success rate over the browser_use. Cybernaut identifies consistent execution patterns with 84.7% accuracy, enabling reliable confidence assessment and adaptive guidance during task execution in real-world systems. These results highlight Cybernaut's effectiveness in enterprise-scale web automation and lay a foundation for future advancements in web automation.
翻译:通过大型语言模型(LLMs)实现的AI驱动网络自动化的兴起,为优化数字工作流程提供了前所未有的机遇。然而,在工业界的真实环境中部署此类系统面临四个核心挑战:(1)确保执行一致性,(2)准确识别关键HTML元素,(3)达到类人精度以实现大规模自动化操作,以及(4)缺乏针对内部网络应用的全面基准测试数据。现有解决方案主要面向设计良好的消费者网站(如Amazon.com、Apple.com),难以应对设计不佳的内部网络界面的复杂性。为克服这些局限,我们提出Cybernaut——一个旨在确保面向企业级稳健应用的网络自动化代理具有高执行一致性的新型框架。我们的贡献包括三方面:(1)标准操作程序(SOP)生成器,可将用户演示转化为适用于线性浏览任务的可靠自动化指令;(2)针对复杂网络界面挑战定制的高精度HTML DOM元素识别系统;(3)用于评估执行一致性的量化指标。在我们内部基准测试上的实证评估表明,使用本框架可使任务执行成功率较browser_use提升23.2%(从72%提高至88.68%)。Cybernaut能以84.7%的准确率识别一致执行模式,从而在实际系统任务执行过程中实现可靠的置信度评估与自适应引导。这些结果凸显了Cybernaut在企业级网络自动化中的有效性,并为网络自动化的未来发展奠定了基础。