Recently, AI-driven interactions with computing devices have advanced from basic prototype tools to sophisticated, LLM-based systems that emulate human-like operations in graphical user interfaces. We are now witnessing the emergence of \emph{Computer-Using Agents} (CUAs), capable of autonomously performing tasks such as navigating desktop applications, web pages, and mobile apps. However, as these agents grow in capability, they also introduce novel safety and security risks. Vulnerabilities in LLM-driven reasoning, with the added complexity of integrating multiple software components and multimodal inputs, further complicate the security landscape. In this paper, we present a systematization of knowledge on the safety and security threats of CUAs. We conduct a comprehensive literature review and distill our findings along four research objectives: \textit{\textbf{(i)}} define the CUA that suits safety analysis; \textit{\textbf{(ii)} } categorize current safety threats among CUAs; \textit{\textbf{(iii)}} propose a comprehensive taxonomy of existing defensive strategies; \textit{\textbf{(iv)}} summarize prevailing benchmarks, datasets, and evaluation metrics used to assess the safety and performance of CUAs. Building on these insights, our work provides future researchers with a structured foundation for exploring unexplored vulnerabilities and offers practitioners actionable guidance in designing and deploying secure Computer-Using Agents.
翻译:近年来,人工智能驱动的计算设备交互已从基础的原型工具发展为基于大语言模型的复杂系统,能够在图形用户界面中模拟类人操作。我们正见证着能够自主完成桌面应用、网页及移动应用导航等任务的《计算机使用代理》(Computer-Using Agents, CUAs)的涌现。然而,随着这些代理能力的提升,它们也引入了新型的安全与安保风险。大语言模型驱动的推理中存在的漏洞,加之多软件组件集成与多模态输入的复杂性,进一步加剧了安全格局的复杂性。本文系统梳理了CUAs的安全与安保威胁知识。我们通过全面文献综述,提炼出四大研究目标的研究发现:《(一)》定义适用于安全分析的CUA;《(二)》分类CUAs当前面临的安全威胁;《(三)》提出现有防御策略的综合分类体系;《(四)》总结评估CUAs安全性与性能的主流基准、数据集及评价指标。基于这些洞见,我们的工作为未来研究者探索未涉足漏洞提供了结构化基础,并为实践者设计部署安全的计算机使用代理提供了可操作性指南。