Recent computer-using-agent (CUA) red-teaming papers report prompt-injection attack success rates (ASR) of 42-98%, but these headline numbers cluster on retired models and on the most-vulnerable model in each paper's panel. We ask whether those techniques, reproduced as hand-crafted templates, still work against current frontier CUAs. We release CUA-HandCrafted, a public benchmark of 793 episodes spanning 24 multi-step web tasks, 56 attack templates, 8 attack families, and 4 system-prompt configurations. Against Claude Sonnet 4.6 and GPT-5.4 we measure 0/140 multi-step attack success (Clopper-Pearson 95% upper bound 2.60%); a prompt ablation shows this resistance lives in the model weights. Yet it does not generalize: on a sister coding-agent benchmark (SkillBench), the same weights fall to hand-crafted skill-injection at up to 100%. We argue that the literature's high ASR is largely attributable to RL-optimized injection text rather than the attack categories, and that frontier safety hardening is domain-conditioned, specific to the heavily-targeted browser surface. Reporting techniques without releasing the optimized strings, or extrapolating browser-domain safety to other CUA modalities, makes published ASR numbers unreproducible.
翻译:近期关于计算机使用智能体(CUA)的红队测试论文报告,提示注入攻击的成功率(ASR)达42–98%,但这些显著数字集中于已退役的模型,且均为各论文评测面板中最易受攻击的模型。我们探究这些以手工模板形式复现的技术是否仍能攻击当前前沿CUA。我们发布CUA-HandCrafted公开基准,包含793个测试用例,覆盖24个多步骤网页任务、56个攻击模板、8个攻击家族及4种系统提示配置。针对Claude Sonnet 4.6和GPT-5.4,我们测得0/140次多步骤攻击成功(Clopper-Pearson 95%置信上限为2.60%)。通过消融实验发现,此抵抗力源自模型权重。然而该特性并不具有泛化性:在同类编码智能体基准(SkillBench)上,相同权重的模型对手工技能注入攻击的失败率高达100%。我们提出,文献中高ASR主要归因于经强化学习优化的注入文本而非攻击类别本身,且前沿安全性加固具有域条件性,即高度集中于被重点针对的浏览器层面。若仅报告技术而未发布优化字符串,或从浏览器域安全性外推至其他CUA模态,将导致已发表的ASR数据无法复现。