COGNITION: From Evaluation to Defense against Multimodal LLM CAPTCHA Solvers

This paper studies how multimodal large language models (MLLMs) undermine the security guarantees of visual CAPTCHA. We identify the attack surface where an adversary can cheaply automate CAPTCHA solving using off-the-shelf models. We evaluate 7 representative MLLMs on 18 real-world CAPTCHA task types, measuring single-shot accuracy, success under limited retries, end-to-end latency, and per-solve cost. We further validate our findings through a supplemental external dataset and an adaptive-attacker setting with session memory, while also analyzing the impact of task-specific prompt engineering and few-shot demonstrations on solver effectiveness. We reveal that MLLMs can reliably solve recognition-oriented and low-interaction CAPTCHA tasks at human-like cost and latency, whereas tasks requiring fine-grained localization, multi-step spatial reasoning, or cross-frame consistency remain significantly harder for current models. By examining the reasoning traces of such MLLMs, we investigate the underlying mechanisms of why models succeed/fail on specific CAPTCHA puzzles and use these insights to derive defense-oriented guidelines for selecting and strengthening CAPTCHA tasks. To validate these principles, we present a proof-of-concept by hardening a vulnerable CAPTCHA type using our guidelines. We demonstrate that incorporating fine-grained localization and implicit counting reduces the success rate of state-of-the-art MLLMs from over 95\% to 0\%, confirming that structural changes can effectively mitigate the threat. We conclude by emphasizing the urgent need for CAPTCHA redesign as MLLM capabilities increasingly threaten existing defenses. Code Availability (https://doi.org/10.5281/zenodo.20406852).

翻译：本文研究多模态大语言模型如何削弱视觉验证码的安全保障。我们识别出攻击面：攻击者可利用现成模型低成本自动化破解验证码。我们在18种真实验证码任务类型上评估了7种代表性多模态大语言模型，测量单次准确率、有限重试成功率、端到端延迟及单次破解成本。通过补充外部数据集和具有会话记忆的自适应攻击者设置进一步验证发现，同时分析了任务特定提示工程和少样本示范对破解器效能的影响。研究揭示：多模态大语言模型能够以人类级别的成本和延迟可靠地破解面向识别和低交互验证码任务，而需要细粒度定位、多步空间推理或跨帧一致性的任务对当前模型仍显著困难。通过分析此类模型的推理轨迹，我们探究了模型在特定验证码谜题上成功/失败的潜在机制，并利用这些洞察推导出面向防御的验证码任务选择与强化指南。为验证这些原则，我们通过指南强化脆弱验证码类型提供概念验证。实验表明：融入细粒度定位与隐式计数后，最先进多模态大语言模型的破解成功率从95%以上降至0%，证实结构性变化能有效缓解威胁。最后强调：随着多模态大语言模型能力日益威胁现有防御体系，重新设计验证码具有紧迫性。代码可用性（https://doi.org/10.5281/zenodo.20406852）。

相关内容

验证码

关注 4

全自动区分计算机和人类的图灵测试（英语： Completely Automated Public Turing test to tell Computers and Humans Apart，简称 CAPTCHA），俗称 验证码，是一种区分用户是计算机和人的公共全自动程序。

[ICML 2026] SOL：让大模型把算力花在关键Token上：自优化语言模型

专知会员服务

7+阅读 · 5月12日

《ARMOR 2025：一个面向军事领域的基准，用于评估大语言模型安全性》

专知会员服务

21+阅读 · 5月7日

【ICML2025】层级对齐：在视觉语言模型中检验图像编码器层的安全对齐

专知会员服务

7+阅读 · 2025年5月2日

《多模态大语言模型评估综述》

专知会员服务

41+阅读 · 2024年8月29日