Blind Gods and Broken Screens: Architecting a Secure, Intent-Centric Mobile Agent Operating System

The evolution of Large Language Models (LLMs) has shifted mobile computing from App-centric interactions to system-level autonomous agents. Current implementations predominantly rely on a "Screen-as-Interface" paradigm, which inherits structural vulnerabilities and conflicts with the mobile ecosystem's economic foundations. In this paper, we conduct a systematic security analysis of state-of-the-art mobile agents using Doubao Mobile Assistant as a representative case. We decompose the threat landscape into four dimensions - Agent Identity, External Interface, Internal Reasoning, and Action Execution - revealing critical flaws such as fake App identity, visual spoofing, indirect prompt injection, and unauthorized privilege escalation stemming from a reliance on unstructured visual data. To address these challenges, we propose Aura, an Agent Universal Runtime Architecture for a clean-slate secure agent OS. Aura replaces brittle GUI scraping with a structured, agent-native interaction model. It adopts a Hub-and-Spoke topology where a privileged System Agent orchestrates intent, sandboxed App Agents execute domain-specific tasks, and the Agent Kernel mediates all communication. The Agent Kernel enforces four defense pillars: (i) cryptographic identity binding via a Global Agent Registry; (ii) semantic input sanitization through a multilayer Semantic Firewall; (iii) cognitive integrity via taint-aware memory and plan-trajectory alignment; and (iv) granular access control with non-deniable auditing. Evaluation on MobileSafetyBench shows that, compared to Doubao, Aura improves low-risk Task Success Rate from roughly 75% to 94.3%, reduces high-risk Attack Success Rate from roughly 40% to 4.4%, and achieves near-order-of-magnitude latency gains. These results demonstrate Aura as a viable, secure alternative to the "Screen-as-Interface" paradigm.

翻译：大型语言模型（LLMs）的发展正将移动计算从以应用为中心的交互模式转向系统级自主智能体。当前的实现主要依赖于“屏幕即接口”范式，该范式继承了结构性安全漏洞，并与移动生态系统的经济基础相冲突。本文以豆包移动助手为代表案例，对最先进的移动智能体进行了系统性安全分析。我们将威胁格局分解为四个维度——智能体身份、外部接口、内部推理和动作执行——揭示了因依赖非结构化视觉数据而产生的关键缺陷，例如虚假应用身份、视觉欺骗、间接提示注入和未经授权的权限提升。为应对这些挑战，我们提出了Aura，一种用于全新安全智能体操作系统的智能体通用运行时架构。Aura以结构化的、智能体原生的交互模型取代了脆弱的GUI抓取。它采用中心辐射型拓扑结构，其中特权系统智能体编排意图，沙盒化的应用智能体执行特定领域任务，而智能体内核则仲裁所有通信。智能体内核强制执行四大防御支柱：（i）通过全局智能体注册表实现加密身份绑定；（ii）通过多层语义防火墙进行语义输入净化；（iii）通过污点感知内存和计划轨迹对齐保障认知完整性；（iv）通过不可否认审计实现细粒度访问控制。在MobileSafetyBench上的评估表明，与豆包相比，Aura将低风险任务成功率从约75%提升至94.3%，将高风险攻击成功率从约40%降低至4.4%，并实现了近数量级的延迟增益。这些结果证明了Aura是“屏幕即接口”范式的一个可行且安全的替代方案。