As audio deepfakes transition from research artifacts to widely available commercial tools, robust biometric authentication faces pressing security threats in high-stakes industries. This paper presents a systematic empirical evaluation of state-of-the-art speaker authentication systems based on a large-scale speech synthesis dataset, revealing two major security vulnerabilities: 1) modern voice cloning models trained on very small samples can easily bypass commercial speaker verification systems; and 2) anti-spoofing detectors struggle to generalize across different methods of audio synthesis, leading to a significant gap between in-domain performance and real-world robustness. These findings call for a reconsideration of security measures and stress the need for architectural innovations, adaptive defenses, and the transition towards multi-factor authentication.
翻译:随着音频深度伪造技术从研究产物转变为广泛可用的商业工具,在高风险行业中,稳健的生物识别认证面临着紧迫的安全威胁。本文基于大规模语音合成数据集,对最先进的说话人认证系统进行了系统的实证评估,揭示了两个主要的安全漏洞:1)基于极少量样本训练的现代语音克隆模型能够轻易绕过商用说话人验证系统;2)反欺骗检测器难以在不同音频合成方法之间实现泛化,导致域内性能与实际鲁棒性之间存在显著差距。这些发现呼吁重新审视安全措施,并强调需要架构创新、自适应防御以及向多因素认证过渡。