Blind Gods and Broken Screens: Architecting a Secure, Intent-Centric Mobile Agent Operating System

The evolution of Large Language Models (LLMs) has shifted mobile computing from App-centric interactions to system-level autonomous agents. Current implementations predominantly rely on a "Screen-as-Interface" paradigm, which inherits structural vulnerabilities and conflicts with the mobile ecosystem's economic foundations. In this paper, we conduct a systematic security analysis of state-of-the-art mobile agents using Doubao Mobile Assistant as a representative case. We decompose the threat landscape into four dimensions - Agent Identity, External Interface, Internal Reasoning, and Action Execution - revealing critical flaws such as fake App identity, visual spoofing, indirect prompt injection, and unauthorized privilege escalation stemming from a reliance on unstructured visual data. To address these challenges, we propose Aura, an Agent Universal Runtime Architecture for a clean-slate secure agent OS. Aura replaces brittle GUI scraping with a structured, agent-native interaction model. It adopts a Hub-and-Spoke topology where a privileged System Agent orchestrates intent, sandboxed App Agents execute domain-specific tasks, and the Agent Kernel mediates all communication. The Agent Kernel enforces four defense pillars: (i) cryptographic identity binding via a Global Agent Registry; (ii) semantic input sanitization through a multilayer Semantic Firewall; (iii) cognitive integrity via taint-aware memory and plan-trajectory alignment; and (iv) granular access control with non-deniable auditing. Evaluation on MobileSafetyBench shows that, compared to Doubao, Aura improves low-risk Task Success Rate from roughly 75% to 94.3%, reduces high-risk Attack Success Rate from roughly 40% to 4.4%, and achieves near-order-of-magnitude latency gains. These results demonstrate Aura as a viable, secure alternative to the "Screen-as-Interface" paradigm.

翻译：大型语言模型（LLM）的发展正在将移动计算从以应用为中心的交互模式转向系统级自主智能体。当前实现主要依赖"屏幕即界面"范式，该范式继承了结构性安全漏洞，并与移动生态系统的经济基础存在冲突。本文以豆包移动助手为典型案例，对前沿移动智能体进行了系统性安全分析。我们将威胁格局分解为四个维度——智能体身份、外部接口、内部推理与动作执行——揭示了因依赖非结构化视觉数据而产生的关键缺陷，包括虚假应用身份、视觉欺骗、间接提示注入和未经授权的权限提升。为应对这些挑战，我们提出了Aura，一种面向全新安全智能体操作系统的智能体通用运行时架构。Aura采用结构化、智能体原生的交互模型替代脆弱的GUI抓取机制。其采用中心辐射型拓扑结构：特权系统智能体负责意图编排，沙盒化应用智能体执行领域特定任务，智能体内核则仲裁所有通信。智能体内核实施四大防御支柱：（i）通过全局智能体注册表实现加密身份绑定；（ii）通过多层语义防火墙进行语义输入净化；（iii）通过污点感知内存与规划轨迹对齐保障认知完整性；（iv）采用不可否认审计的细粒度访问控制。在MobileSafetyBench上的评估表明，相较于豆包，Aura将低风险任务成功率从约75%提升至94.3%，将高风险攻击成功率从约40%降低至4.4%，并实现了近数量级的延迟优化。这些结果证明Aura是替代"屏幕即界面"范式的可行且安全的解决方案。