RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments

Computer-use agents (CUAs) promise to automate complex tasks across operating systems (OS) and the web, but remain vulnerable to indirect prompt injection. Current evaluations of this threat either lack support realistic but controlled environments or ignore hybrid web-OS attack scenarios involving both interfaces. To address this, we propose RedTeamCUA, an adversarial testing framework featuring a novel hybrid sandbox that integrates a VM-based OS environment with Docker-based web platforms. Our sandbox supports key features tailored for red teaming, such as flexible adversarial scenario configuration, and a setting that decouples adversarial evaluation from navigational limitations of CUAs by initializing tests directly at the point of an adversarial injection. Using RedTeamCUA, we develop RTC-Bench, a comprehensive benchmark with 864 examples that investigate realistic, hybrid web-OS attack scenarios and fundamental security vulnerabilities. Benchmarking current frontier CUAs identifies significant vulnerabilities: Claude 3.7 Sonnet | CUA demonstrates an ASR of 42.9%, while Operator, the most secure CUA evaluated, still exhibits an ASR of 7.6%. Notably, CUAs often attempt to execute adversarial tasks with an Attempt Rate as high as 92.5%, although failing to complete them due to capability limitations. Nevertheless, we observe concerning high ASRs in realistic end-to-end settings, with the strongest-to-date Claude 4.5 Sonnet | CUA exhibiting the highest ASR of 60%, indicating that CUA threats can already result in tangible risks to users and computer systems. Overall, RedTeamCUA provides an essential framework for advancing realistic, controlled, and systematic analysis of CUA vulnerabilities, highlighting the urgent need for robust defenses to indirect prompt injection prior to real-world deployment.

翻译：计算机使用代理（CUAs）有望跨操作系统（OS）和网络自动化复杂任务，但仍易受间接提示注入攻击。当前对此威胁的评估要么缺乏对现实但受控环境的支持，要么忽略了涉及两种界面的混合Web-OS攻击场景。为解决此问题，我们提出了RedTeamCUA，这是一个对抗性测试框架，其特点是采用了一种新颖的混合沙箱，该沙箱将基于虚拟机的OS环境与基于Docker的Web平台集成在一起。我们的沙箱支持为红队测试量身定制的关键功能，例如灵活的对抗场景配置，以及一种通过直接在对抗性注入点初始化测试来将对抗性评估与CUAs的导航限制解耦的设置。利用RedTeamCUA，我们开发了RTC-Bench，这是一个包含864个示例的综合基准测试集，用于研究现实的混合Web-OS攻击场景和基本安全漏洞。对当前前沿CUAs的基准测试揭示了显著的漏洞：Claude 3.7 Sonnet | CUA的ASR（攻击成功率）为42.9%，而所评估的最安全的CUA——Operator，其ASR仍为7.6%。值得注意的是，CUAs常常试图执行对抗性任务，尝试率（Attempt Rate）高达92.5%，尽管由于能力限制未能完成。然而，我们在现实的端到端设置中观察到令人担忧的高ASR，迄今为止最强的Claude 4.5 Sonnet | CUA表现出最高的ASR，达到60%，这表明CUA威胁已经可能对用户和计算机系统造成切实风险。总体而言，RedTeamCUA为推进对CUA漏洞的现实、受控和系统性分析提供了一个重要框架，凸显了在实际部署前针对间接提示注入建立强大防御的迫切需求。