Mobile Graphical User Interface (GUI) agents have demonstrated strong capabilities in automating complex smartphone tasks by leveraging multimodal large language models (MLLMs) and system-level control interfaces. However, this paradigm introduces significant privacy risks, as agents typically capture and process entire screen contents, thereby exposing sensitive personal data such as phone numbers, addresses, messages, and financial information. Existing defenses either reduce UI exposure, obfuscate only task-irrelevant content, or rely on user authorization, but none can protect task-critical sensitive information while preserving seamless agent usability. We propose an anonymization-based privacy protection framework that enforces the principle of available-but-invisible access to sensitive data: sensitive information remains usable for task execution but is never directly visible to the cloud-based agent. Our system detects sensitive UI content using a PII-aware recognition model and replaces it with deterministic, type-preserving placeholders (e.g., PHONE_NUMBER#a1b2c) that retain semantic categories while removing identifying details. A layered architecture comprising a PII Detector, UI Transformer, Secure Interaction Proxy, and Privacy Gatekeeper ensures consistent anonymization across user instructions, XML hierarchies, and screenshots, mediates all agent actions over anonymized interfaces, and supports narrowly scoped local computations when reasoning over raw values is necessary. Extensive experiments on the AndroidLab and PrivScreen benchmarks show that our framework substantially reduces privacy leakage across multiple models while incurring only modest utility degradation, achieving the best observed privacy-utility trade-off among existing methods. Code available at: https://github.com/one-step-beh1nd/gui_privacy_protection
翻译:移动图形用户界面(GUI)代理通过利用多模态大语言模型(MLLMs)和系统级控制接口,在自动化复杂智能手机任务方面展现出强大能力。然而,这种范式引入了显著的隐私风险,因为代理通常会捕获并处理整个屏幕内容,从而暴露敏感个人数据,如电话号码、地址、消息和财务信息。现有防御方法要么减少UI暴露,要么仅混淆与任务无关的内容,或依赖用户授权,但都无法在保持代理无缝可用性的同时保护任务关键敏感信息。我们提出了一种基于匿名化的隐私保护框架,强制执行对敏感数据的“可用但不可见”访问原则:敏感信息在任务执行中保持可用,但永远不会直接暴露给基于云的代理。我们的系统使用PII感知识别模型检测敏感UI内容,并用确定性的、保留类型的占位符(例如,PHONE_NUMBER#a1b2c)替换它们,这些占位符在移除识别细节的同时保留了语义类别。一个包含PII检测器、UI转换器、安全交互代理和隐私守护者的分层架构,确保了在用户指令、XML层次结构和屏幕截图之间的一致匿名化,通过匿名化接口调解所有代理操作,并在需要对原始值进行推理时支持范围狭窄的本地计算。在AndroidLab和PrivScreen基准测试上进行的大量实验表明,我们的框架显著减少了多个模型的隐私泄露,同时仅带来适度的效用下降,在现有方法中实现了最佳观察到的隐私-效用权衡。代码发布于:https://github.com/one-step-beh1nd/gui_privacy_protection