GUI agents enable end-to-end automation through direct perception of and interaction with on-screen interfaces. However, these agents frequently access interfaces containing sensitive personal information, and screenshots are often transmitted to remote models, creating substantial privacy risks. These risks are particularly severe in GUI workflows: GUIs expose richer, more accessible private information, and privacy risks depend on interaction trajectories across sequential scenes. We propose GUIGuard, a three-stage framework for privacy-preserving GUI agents: (1) privacy recognition, (2) privacy protection, and (3) task execution under protection. We further construct GUIGuard-Bench, a cross-platform benchmark with 630 trajectories and 13,830 screenshots, annotated with region-level privacy grounding and fine-grained labels of risk level, privacy category, and task necessity. Evaluations reveal that existing agents exhibit limited privacy recognition, with state-of-the-art models achieving only 13.3% accuracy on Android and 1.4% on PC. Under privacy protection, task-planning semantics can still be maintained, with closed-source models showing stronger semantic consistency than open-source ones. Case studies on MobileWorld show that carefully designed protection strategies achieve higher task accuracy while preserving privacy. Our results highlight privacy recognition as a critical bottleneck for practical GUI agents. Project: https://futuresis.github.io/GUIGuard-page/
翻译:GUI智能体通过直接感知屏幕界面并与之交互,实现了端到端的自动化。然而,这些智能体频繁访问包含敏感个人信息的界面,且屏幕截图常被传输至远程模型,造成了重大的隐私风险。这些风险在GUI工作流中尤为严重:GUI暴露了更丰富、更易访问的私人信息,且隐私风险取决于跨连续场景的交互轨迹。我们提出了GUIGuard,一个用于隐私保护型GUI智能体的三阶段框架:(1)隐私识别,(2)隐私保护,以及(3)保护下的任务执行。我们进一步构建了GUIGuard-Bench,这是一个跨平台基准测试集,包含630条轨迹和13,830张屏幕截图,并标注了区域级隐私定位以及风险等级、隐私类别和任务必要性的细粒度标签。评估结果表明,现有智能体的隐私识别能力有限,最先进的模型在Android平台上仅达到13.3%的准确率,在PC平台上仅为1.4%。在隐私保护下,任务规划的语义仍能得到保持,其中闭源模型比开源模型表现出更强的语义一致性。在MobileWorld上的案例研究表明,精心设计的保护策略能在保护隐私的同时实现更高的任务准确率。我们的研究结果凸显了隐私识别是实用化GUI智能体的关键瓶颈。项目地址:https://futuresis.github.io/GUIGuard-page/