To enhance the security of text CAPTCHAs, various methods have been employed, such as adding the interference lines on the text, randomly distorting the characters, and overlapping multiple characters. These methods partly increase the difficulty of automated segmentation and recognition attacks. However, facing the rapid development of the end-to-end breaking algorithms, their security has been greatly weakened. The diffusion model is a novel image generation model that can generate the text images with deep fusion of characters and background images. In this paper, an image-click CAPTCHA scheme called Diff-CAPTCHA is proposed based on denoising diffusion models. The background image and characters of the CAPTCHA are treated as a whole to guide the generation process of a diffusion model, thus weakening the character features available for machine learning, enhancing the diversity of character features in the CAPTCHA, and increasing the difficulty of breaking algorithms. To evaluate the security of Diff-CAPTCHA, this paper develops several attack methods, including end-to-end attacks based on Faster R-CNN and two-stage attacks, and Diff-CAPTCHA is compared with three baseline schemes, including commercial CAPTCHA scheme and security-enhanced CAPTCHA scheme based on style transfer. The experimental results show that diffusion models can effectively enhance CAPTCHA security while maintaining good usability in human testing.
翻译:为增强文本验证码的安全性,研究者采用了多种方法,例如在文本上添加干扰线、随机扭曲字符以及重叠多个字符。这些方法在一定程度上增加了自动化分割与识别攻击的难度。然而,面对端到端破解算法的快速发展,其安全性已被大幅削弱。扩散模型是一种新型图像生成模型,能够生成字符与背景图像深度融合的文本图像。本文提出了一种基于去噪扩散模型的图像点击验证码方案——Diff-CAPTCHA。该方案将验证码的背景图像与字符视为一个整体,以引导扩散模型的生成过程,从而削弱可供机器学习利用的字符特征,增强验证码中字符特征的多样性,并提高破解算法的难度。为评估Diff-CAPTCHA的安全性,本文开发了多种攻击方法,包括基于Faster R-CNN的端到端攻击和两阶段攻击,并将Diff-CAPTCHA与三种基线方案进行了对比,包括商业验证码方案和基于风格迁移的安全性增强验证码方案。实验结果表明,扩散模型能够在保持良好人类测试可用性的同时,有效提升验证码的安全性。