The rapid evolution of GUI-enabled agents has rendered traditional CAPTCHAs obsolete. While previous benchmarks like OpenCaptchaWorld established a baseline for evaluating multimodal agents, recent advancements in reasoning-heavy models, such as Gemini3-Pro-High and GPT-5.2-Xhigh have effectively collapsed this security barrier, achieving pass rates as high as 90% on complex logic puzzles like "Bingo". In response, we introduce Next-Gen CAPTCHAs, a scalable defense framework designed to secure the next-generation web against the advanced agents. Unlike static datasets, our benchmark is built upon a robust data generation pipeline, allowing for large-scale and easily scalable evaluations, notably, for backend-supported types, our system is capable of generating effectively unbounded CAPTCHA instances. We exploit the persistent human-agent "Cognitive Gap" in interactive perception, memory, decision-making, and action. By engineering dynamic tasks that require adaptive intuition rather than granular planning, we re-establish a robust distinction between biological users and artificial agents, offering a scalable and diverse defense mechanism for the agentic era.
翻译:图形用户界面代理的快速发展已使传统验证码过时。尽管早期基准测试(如OpenCaptchaWorld)为评估多模态代理建立了基础,但近期以推理为核心的大模型(例如Gemini3-Pro-High和GPT-5.2-Xhigh)的进步已有效突破了这一安全屏障,在"Bingo"等复杂逻辑谜题上实现了高达90%的通过率。为此,我们提出"下一代验证码"——一个可扩展的防御框架,旨在保护下一代网络免受高级代理的威胁。与静态数据集不同,我们的基准测试建立在稳健的数据生成流水线之上,支持大规模且易于扩展的评估;特别值得注意的是,对于后端支持的类型,我们的系统能够生成理论上无限量的验证码实例。我们利用人类与代理在交互感知、记忆、决策和行动方面持续存在的"认知鸿沟",通过设计需要适应性直觉而非精细化规划的动态任务,重新建立起生物用户与人工代理之间的显著区分,为代理时代提供了一种可扩展且多样化的防御机制。