Recent advancements in Web agents have introduced novel architectures and benchmarks showcasing progress in autonomous web navigation and interaction. However, most existing benchmarks prioritize effectiveness and accuracy, overlooking factors like safety and trustworthiness which are essential for deploying web agents in enterprise settings. We present STWebAgentBench, a benchmark designed to evaluate web agents safety and trustworthiness across six critical dimensions, essential for reliability in enterprise applications. This benchmark is grounded in a detailed framework that defines safe and trustworthy (ST) agent behavior. Our work extends WebArena with safety templates and evaluation functions to assess safety policy compliance rigorously. We introduce the Completion Under Policy to measure task success while adhering to policies, alongside the Risk Ratio, which quantifies policy violations across dimensions, providing actionable insights to address safety gaps. Our evaluation reveals that current SOTA agents struggle with policy adherence and cannot yet be relied upon for critical business applications. We open-source this benchmark and invite the community to contribute, with the goal of fostering a new generation of safer, more trustworthy AI agents. All code, data, environment reproduction resources, and video demonstrations are available at https://sites.google.com/view/st-webagentbench/home.
翻译:近年来,Web智能体的发展引入了新颖的架构和基准,展示了在自主网络导航与交互方面取得的进展。然而,现有的大多数基准主要关注有效性和准确性,忽视了安全性与可信度等对于在企业环境中部署Web智能体至关重要的因素。我们提出了STWebAgentBench,这是一个旨在从六个关键维度评估Web智能体安全性与可信度的基准,这些维度对于企业应用中的可靠性至关重要。该基准建立在一个详细定义了安全与可信(ST)智能体行为的框架之上。我们的工作通过引入安全模板和评估函数扩展了WebArena,以严格评估安全策略的合规性。我们提出了策略遵从下的任务完成度来衡量在遵守策略的同时任务的成功率,以及风险比率来量化跨维度的策略违反情况,从而为解决安全漏洞提供可操作的见解。我们的评估表明,当前最先进的智能体在策略遵从方面存在困难,尚不能可靠地用于关键业务应用。我们开源了这一基准,并邀请社区参与贡献,目标是培育新一代更安全、更可信的AI智能体。所有代码、数据、环境复现资源和视频演示均可在 https://sites.google.com/view/st-webagentbench/home 获取。