Next-Gen CAPTCHAs: Leveraging the Cognitive Gap for Scalable and Diverse GUI-Agent Defense

The rapid evolution of GUI-enabled agents has rendered traditional CAPTCHAs obsolete. While previous benchmarks like OpenCaptchaWorld established a baseline for evaluating multimodal agents, recent advancements in reasoning-heavy models, such as Gemini3-Pro-High and GPT-5.2-Xhigh have effectively collapsed this security barrier, achieving pass rates as high as 90% on complex logic puzzles like "Bingo". In response, we introduce Next-Gen CAPTCHAs, a scalable defense framework designed to secure the next-generation web against the advanced agents. Unlike static datasets, our benchmark is built upon a robust data generation pipeline, allowing for large-scale and easily scalable evaluations, notably, for backend-supported types, our system is capable of generating effectively unbounded CAPTCHA instances. We exploit the persistent human-agent "Cognitive Gap" in interactive perception, memory, decision-making, and action. By engineering dynamic tasks that require adaptive intuition rather than granular planning, we re-establish a robust distinction between biological users and artificial agents, offering a scalable and diverse defense mechanism for the agentic era.

翻译：图形用户界面代理的快速发展已使传统验证码过时。尽管早期基准测试（如OpenCaptchaWorld）为评估多模态代理建立了基础，但近期以推理为核心的大模型（例如Gemini3-Pro-High和GPT-5.2-Xhigh）的进步已有效突破了这一安全屏障，在"Bingo"等复杂逻辑谜题上实现了高达90%的通过率。为此，我们提出"下一代验证码"——一个可扩展的防御框架，旨在保护下一代网络免受高级代理的威胁。与静态数据集不同，我们的基准测试建立在稳健的数据生成流水线之上，支持大规模且易于扩展的评估；特别值得注意的是，对于后端支持的类型，我们的系统能够生成理论上无限量的验证码实例。我们利用人类与代理在交互感知、记忆、决策和行动方面持续存在的"认知鸿沟"，通过设计需要适应性直觉而非精细化规划的动态任务，重新建立起生物用户与人工代理之间的显著区分，为代理时代提供了一种可扩展且多样化的防御机制。

相关内容

验证码

关注 4

全自动区分计算机和人类的图灵测试（英语： Completely Automated Public Turing test to tell Computers and Humans Apart，简称 CAPTCHA），俗称 验证码，是一种区分用户是计算机和人的公共全自动程序。

OpenEarthAgent：一种面向工具增强型地理空间智能体的统一框架

专知会员服务

16+阅读 · 2月20日

【ICML2025】层级对齐：在视觉语言模型中检验图像编码器层的安全对齐

专知会员服务

7+阅读 · 2025年5月2日

如何提升大模型通用推理能力？DeepSeek最新论文《CODEI/O：通过代码输入输出预测凝练推理模式》

专知会员服务

42+阅读 · 2025年2月16日