The rapid progress in the reasoning capability of the Multi-modal Large Language Models (MLLMs) has triggered the development of autonomous agent systems on mobile devices. MLLM-based mobile agent systems consist of perception, reasoning, memory, and multi-agent collaboration modules, enabling automatic analysis of user instructions and the design of task pipelines with only natural language and device screenshots as inputs. Despite the increased human-machine interaction efficiency, the security risks of MLLM-based mobile agent systems have not been systematically studied. Existing security benchmarks for agents mainly focus on Web scenarios, and the attack techniques against MLLMs are also limited in the mobile agent scenario. To close these gaps, this paper proposes a mobile agent security matrix covering 3 functional modules of the agent systems. Based on the security matrix, this paper proposes 4 realistic attack paths and verifies these attack paths through 8 attack methods. By analyzing the attack results, this paper reveals that MLLM-based mobile agent systems are not only vulnerable to multiple traditional attacks, but also raise new security concerns previously unconsidered. This paper highlights the need for security awareness in the design of MLLM-based systems and paves the way for future research on attacks and defense methods.
翻译:多模态大语言模型推理能力的快速进步推动了移动设备上自主智能体系统的发展。基于MLLM的移动智能体系统包含感知、推理、记忆与多智能体协作模块,能够仅以自然语言和设备屏幕截图作为输入,自动分析用户指令并设计任务流程。尽管人机交互效率得到提升,但基于MLLM的移动智能体系统的安全风险尚未得到系统性研究。现有智能体安全基准主要关注Web场景,而针对MLLM的攻击技术在移动智能体场景中也存在局限。为填补这些空白,本文提出了覆盖智能体系统3个功能模块的移动智能体安全矩阵。基于该安全矩阵,本文提出4条现实攻击路径,并通过8种攻击方法验证了这些路径。通过分析攻击结果,本文揭示基于MLLM的移动智能体系统不仅易受多种传统攻击影响,还引发了先前未考虑的新安全隐患。本研究强调了基于MLLM系统设计中安全意识的必要性,并为未来攻防方法研究奠定了基础。