WebSP-Eval: Evaluating Web Agents on Website Security and Privacy Tasks - 专知论文

会员服务 ·

0

Agent · WEB · MoDELS · Automator · Google Chrome ·

WebSP-Eval: Evaluating Web Agents on Website Security and Privacy Tasks

翻译：暂无翻译

Guruprasad Viswanathan Ramesh,Asmit Nayak,Basieem Siddique,Kassem Fawaz

from arxiv, Accepted at PETS 2026. Project Page: https://wiscprivacy.com/webspeval/

Web agents automate browser tasks, ranging from simple form completion to complex workflows like ordering groceries. While current benchmarks evaluate general-purpose performance~(e.g., WebArena) or safety against malicious actions~(e.g., SafeArena), no existing framework assesses an agent's ability to successfully execute user-facing website security and privacy tasks, such as managing cookie preferences, configuring privacy-sensitive account settings, or revoking inactive sessions. To address this gap, we introduce WebSP-Eval, an evaluation framework for measuring web agent performance on website security and privacy tasks. WebSP-Eval comprises 1) a manually crafted task dataset of 200 task instances across 28 websites; 2) a robust agentic system supporting account and initial state management across runs using a custom Google Chrome extension; and 3) an automated evaluator. We evaluate a total of 8 web agent instantiations using state-of-the-art multimodal large language models, conducting a fine-grained analysis across websites, task categories, and UI elements. Our evaluation reveals that current models suffer from limited autonomous exploration capabilities to reliably solve website security and privacy tasks, and struggle with specific task categories and websites. Crucially, we identify stateful UI elements are a primary reason for agent failure, with toggles causing more than 45% task failure across many models.

翻译：暂无翻译

0

相关内容

Agent

AI行业专题报告：工具生态逐步完善，通用Agent曙光已现

AI行业专题报告：工具生态逐步完善，通用Agent曙光已现

专知会员服务

33+阅读 · 2025年3月27日

《边缘计算通信安全威胁及计算任务分类》

《边缘计算通信安全威胁及计算任务分类》

专知会员服务

36+阅读 · 2023年11月13日

AI Agent下一个热点？复旦最新86页《大型语言模型智能体的崛起与潜力》综述，详述LLM Agent: 大脑、感知和行动

AI Agent下一个热点？复旦最新86页《大型语言模型智能体的崛起与潜力》综述，详述LLM Agent: 大脑、感知和行动

专知会员服务

170+阅读 · 2023年9月15日

《边缘计算网络安全最佳实践概述》

《边缘计算网络安全最佳实践概述》

专知会员服务

39+阅读 · 2022年7月6日

2022《数据安全治理白皮书 4.0》，219页pdf，中关村网络安全与信息化产业联盟数据安全治理专业委员会发布

2022《数据安全治理白皮书 4.0》，219页pdf，中关村网络安全与信息化产业联盟数据安全治理专业委员会发布

专知会员服务

65+阅读 · 2022年5月31日

【书籍】网络安全《移动目标防御 II：博弈论和对抗性建模的应用》210页，Moving Target Defense II：Application of Game Theory and Adversarial Modeling

【书籍】网络安全《移动目标防御 II：博弈论和对抗性建模的应用》210页，Moving Target Defense II：Application of Game Theory and Adversarial Modeling

专知会员服务

67+阅读 · 2022年4月14日

【论文推荐】保护隐私的协同过滤综述，Survey of Privacy-Preserving Collaborative Filtering

【论文推荐】保护隐私的协同过滤综述，Survey of Privacy-Preserving Collaborative Filtering

专知会员服务

36+阅读 · 2020年3月19日

【CIKM 2019论文】基于关系型图卷积网络的代理发起的社会化电子商务推荐（Relation-Aware Graph Convolutional Networks for Agent-Initiated Social E-Commerce Recommendation）

【CIKM 2019论文】基于关系型图卷积网络的代理发起的社会化电子商务推荐（Relation-Aware Graph Convolutional Networks for Agent-Initiated Social E-Commerce Recommendation）

专知会员服务

56+阅读 · 2019年11月20日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

【经典书】网络安全《移动目标防御 II：博弈论和对抗性建模的应用》210页

【经典书】网络安全《移动目标防御 II：博弈论和对抗性建模的应用》210页

专知

17+阅读 · 2022年4月16日

微信小程序支持webP的WebAssembly方案

微信小程序支持webP的WebAssembly方案

前端之巅

19+阅读 · 2019年8月14日

Webkiller 简单利用

Webkiller 简单利用

黑白之道

11+阅读 · 2019年6月11日

Github项目推荐 | 推荐系统实例与最佳实践 by 微软

Github项目推荐 | 推荐系统实例与最佳实践 by 微软

AI研习社

20+阅读 · 2019年1月2日

WebAssembly在QQ邮箱中的一次实践

WebAssembly在QQ邮箱中的一次实践

IMWeb前端社区

13+阅读 · 2018年12月19日

Fully-Convolutional Siamese Networks for Object Tracking论文笔记

Fully-Convolutional Siamese Networks for Object Tracking论文笔记

统计学习与视觉计算组

10+阅读 · 2018年10月12日

【预测性维护】工业互联网正确打开方式系列（九）：预测性维护

【预测性维护】工业互联网正确打开方式系列（九）：预测性维护

产业智能官

35+阅读 · 2018年9月6日

Network Embedding 指南

Network Embedding 指南

专知

22+阅读 · 2018年8月13日

NetworkMiner - 网络取证分析工具

NetworkMiner - 网络取证分析工具

黑白之道

16+阅读 · 2018年6月29日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

移动互联网的用户隐私保护研究

国家自然科学基金

2+阅读 · 2017年12月31日

物联网安全搜索技术研究

国家自然科学基金

3+阅读 · 2017年12月31日

面向动态演化的网构软件失效机理与测评方法

国家自然科学基金

1+阅读 · 2015年12月31日

网络安全威胁踪源分析方法研究

国家自然科学基金

19+阅读 · 2015年12月31日

网络空间安全关键技术研究

国家自然科学基金

20+阅读 · 2015年12月31日

基于代数规约的Web服务在线测试理论和技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于网络活动分析的窃密木马检测技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

应用服务保障完成时限的网络传输机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

微网安全风险评估研究

国家自然科学基金

2+阅读 · 2014年12月31日

移动互联网服务及隐私保护的理论与关键技术研究

国家自然科学基金

1+阅读 · 2014年12月31日

Beyond Static Endpoints: Tool Programs as an Interface for Flexible Agentic Web Services

Arxiv

0+阅读 · 6月18日

From Privacy to Workflow Integrity: Communication-Graph Metadata in Autonomous Agent Interoperability

Arxiv

0+阅读 · 6月17日

TRAP: Benchmark for Task-completion and Resistance to Active Privacy-extraction

Arxiv

0+阅读 · 6月17日

Configuration Smells in AGENTS.md Files: Common Mistakes in Configuring Coding Agents

Arxiv

0+阅读 · 6月17日

HANSEL: Extracting Breadcrumbs from Web Agent Trajectories for Interactive Verification

Arxiv

0+阅读 · 6月17日

Resilience of Task-Oriented V2X Networks to Incomplete Information Sharing

Arxiv

0+阅读 · 6月8日

Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling

Arxiv

0+阅读 · 5月29日

Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User History

Arxiv

0+阅读 · 5月26日

Anytime Detection of Strategic Deviations in Multi-Agent Systems

Arxiv

0+阅读 · 5月22日

When Should Users Check? Modeling Confirmation Frequency inMulti-Step Agentic AI Tasks

Arxiv

0+阅读 · 5月7日

VIP会员

文章信息

相关主题

最新内容

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

专知会员服务

2+阅读 · 6月18日

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

专知会员服务

3+阅读 · 6月18日

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

专知会员服务

8+阅读 · 6月18日

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

专知会员服务

6+阅读 · 6月18日

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

专知会员服务

4+阅读 · 6月17日

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

专知会员服务

6+阅读 · 6月17日

学习数据的几何：形状空间分析数学综述

学习数据的几何：形状空间分析数学综述

专知会员服务

6+阅读 · 6月17日

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

专知会员服务

8+阅读 · 6月17日

定向能反无人机系统最新发展动态

定向能反无人机系统最新发展动态

专知会员服务

7+阅读 · 6月17日

从燃煤战舰到算法战争：水面指挥的永恒要求

从燃煤战舰到算法战争：水面指挥的永恒要求

专知会员服务

4+阅读 · 6月17日

《短程弹道再入飞行器拦截时间中的一项异常现象》

《短程弹道再入飞行器拦截时间中的一项异常现象》

专知会员服务

6+阅读 · 6月17日

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

专知会员服务

7+阅读 · 6月17日

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

专知会员服务

5+阅读 · 6月17日

《韩国国防政策与军备出口：韩国安全与国防政策如何塑造其国防工业与军备出口格局》最新100页报告

《韩国国防政策与军备出口：韩国安全与国防政策如何塑造其国防工业与军备出口格局》最新100页报告

专知会员服务

5+阅读 · 6月17日

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

专知会员服务

6+阅读 · 6月16日

相关VIP内容

AI行业专题报告：工具生态逐步完善，通用Agent曙光已现

AI行业专题报告：工具生态逐步完善，通用Agent曙光已现

专知会员服务

33+阅读 · 2025年3月27日

《边缘计算通信安全威胁及计算任务分类》

《边缘计算通信安全威胁及计算任务分类》

专知会员服务

36+阅读 · 2023年11月13日

AI Agent下一个热点？复旦最新86页《大型语言模型智能体的崛起与潜力》综述，详述LLM Agent: 大脑、感知和行动

AI Agent下一个热点？复旦最新86页《大型语言模型智能体的崛起与潜力》综述，详述LLM Agent: 大脑、感知和行动

专知会员服务

170+阅读 · 2023年9月15日

《边缘计算网络安全最佳实践概述》

《边缘计算网络安全最佳实践概述》

专知会员服务

39+阅读 · 2022年7月6日

2022《数据安全治理白皮书 4.0》，219页pdf，中关村网络安全与信息化产业联盟数据安全治理专业委员会发布

2022《数据安全治理白皮书 4.0》，219页pdf，中关村网络安全与信息化产业联盟数据安全治理专业委员会发布

专知会员服务

65+阅读 · 2022年5月31日

【书籍】网络安全《移动目标防御 II：博弈论和对抗性建模的应用》210页，Moving Target Defense II：Application of Game Theory and Adversarial Modeling

【书籍】网络安全《移动目标防御 II：博弈论和对抗性建模的应用》210页，Moving Target Defense II：Application of Game Theory and Adversarial Modeling

专知会员服务

67+阅读 · 2022年4月14日

【论文推荐】保护隐私的协同过滤综述，Survey of Privacy-Preserving Collaborative Filtering

【论文推荐】保护隐私的协同过滤综述，Survey of Privacy-Preserving Collaborative Filtering

专知会员服务

36+阅读 · 2020年3月19日

【CIKM 2019论文】基于关系型图卷积网络的代理发起的社会化电子商务推荐（Relation-Aware Graph Convolutional Networks for Agent-Initiated Social E-Commerce Recommendation）

【CIKM 2019论文】基于关系型图卷积网络的代理发起的社会化电子商务推荐（Relation-Aware Graph Convolutional Networks for Agent-Initiated Social E-Commerce Recommendation）

专知会员服务

56+阅读 · 2019年11月20日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

相关资讯

【经典书】网络安全《移动目标防御 II：博弈论和对抗性建模的应用》210页

【经典书】网络安全《移动目标防御 II：博弈论和对抗性建模的应用》210页

专知

17+阅读 · 2022年4月16日

微信小程序支持webP的WebAssembly方案

微信小程序支持webP的WebAssembly方案

前端之巅

19+阅读 · 2019年8月14日

Webkiller 简单利用

Webkiller 简单利用

黑白之道

11+阅读 · 2019年6月11日

Github项目推荐 | 推荐系统实例与最佳实践 by 微软

Github项目推荐 | 推荐系统实例与最佳实践 by 微软

AI研习社

20+阅读 · 2019年1月2日

WebAssembly在QQ邮箱中的一次实践

WebAssembly在QQ邮箱中的一次实践

IMWeb前端社区

13+阅读 · 2018年12月19日

Fully-Convolutional Siamese Networks for Object Tracking论文笔记

Fully-Convolutional Siamese Networks for Object Tracking论文笔记

统计学习与视觉计算组

10+阅读 · 2018年10月12日

【预测性维护】工业互联网正确打开方式系列（九）：预测性维护

【预测性维护】工业互联网正确打开方式系列（九）：预测性维护

产业智能官

35+阅读 · 2018年9月6日

Network Embedding 指南

Network Embedding 指南

专知

22+阅读 · 2018年8月13日

NetworkMiner - 网络取证分析工具

NetworkMiner - 网络取证分析工具

黑白之道

16+阅读 · 2018年6月29日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

相关论文

Beyond Static Endpoints: Tool Programs as an Interface for Flexible Agentic Web Services

Arxiv

0+阅读 · 6月18日

From Privacy to Workflow Integrity: Communication-Graph Metadata in Autonomous Agent Interoperability

Arxiv

0+阅读 · 6月17日

TRAP: Benchmark for Task-completion and Resistance to Active Privacy-extraction

Arxiv

0+阅读 · 6月17日

Configuration Smells in AGENTS.md Files: Common Mistakes in Configuring Coding Agents

Arxiv

0+阅读 · 6月17日

HANSEL: Extracting Breadcrumbs from Web Agent Trajectories for Interactive Verification

Arxiv

0+阅读 · 6月17日

Resilience of Task-Oriented V2X Networks to Incomplete Information Sharing

Arxiv

0+阅读 · 6月8日

Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling

Arxiv

0+阅读 · 5月29日

Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User History

Arxiv

0+阅读 · 5月26日

Anytime Detection of Strategic Deviations in Multi-Agent Systems

Arxiv

0+阅读 · 5月22日

When Should Users Check? Modeling Confirmation Frequency inMulti-Step Agentic AI Tasks

Arxiv

0+阅读 · 5月7日

相关基金

移动互联网的用户隐私保护研究

国家自然科学基金

2+阅读 · 2017年12月31日

物联网安全搜索技术研究

国家自然科学基金

3+阅读 · 2017年12月31日

面向动态演化的网构软件失效机理与测评方法

国家自然科学基金

1+阅读 · 2015年12月31日

网络安全威胁踪源分析方法研究

国家自然科学基金

19+阅读 · 2015年12月31日

网络空间安全关键技术研究

国家自然科学基金

20+阅读 · 2015年12月31日

基于代数规约的Web服务在线测试理论和技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于网络活动分析的窃密木马检测技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

应用服务保障完成时限的网络传输机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

微网安全风险评估研究

国家自然科学基金

2+阅读 · 2014年12月31日

移动互联网服务及隐私保护的理论与关键技术研究

国家自然科学基金

1+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员