Recent advances in operating system (OS) agents have enabled vision-language models (VLMs) to directly control a user's computer. Unlike conventional VLMs that passively output text, OS agents autonomously perform computer-based tasks in response to a single user prompt. OS agents do so by capturing, parsing, and analysing screenshots and executing low-level actions via application programming interfaces (APIs), such as mouse clicks and keyboard inputs. This direct interaction with the OS significantly raises the stakes, as failures or manipulations can have immediate and tangible consequences. In this work, we uncover a novel attack vector against these OS agents: Malicious Image Patches (MIPs), adversarially perturbed screen regions that, when captured by an OS agent, induce it to perform harmful actions by exploiting specific APIs. For instance, a MIP can be embedded in a desktop wallpaper or shared on social media to cause an OS agent to exfiltrate sensitive user data. We show that MIPs generalise across user prompts and screen configurations, and that they can hijack multiple OS agents even during the execution of benign instructions. These findings expose critical security vulnerabilities in OS agents that have to be carefully addressed before their widespread deployment.
翻译:近年来,操作系统(OS)代理的进展使得视觉语言模型(VLMs)能够直接操作用户计算机。与传统被动输出文本的VLM不同,OS代理能够根据单一用户指令自主执行基于计算机的任务。其实现方式是通过捕获、解析和分析屏幕截图,并经由应用程序编程接口(API)执行底层操作(如鼠标点击与键盘输入)。这种与操作系统的直接交互显著提高了风险等级,因为故障或恶意操控可能立即产生实质性后果。本研究揭示了一种针对此类OS代理的新型攻击向量:恶意图像补丁(MIPs)——经过对抗性扰动的屏幕区域,当被OS代理捕获时,会诱导其通过利用特定API执行有害操作。例如,可将MIP嵌入桌面壁纸或发布于社交媒体,致使OS代理窃取敏感用户数据。我们证明MIPs能够泛化至不同用户指令与屏幕配置,并可在执行良性指令期间劫持多个OS代理。这些发现揭示了OS代理中亟待解决的关键安全漏洞,必须在广泛部署前予以审慎应对。