Recent advances in speech synthesis and editing have made speech spoofing increasingly challenging. However, most existing methods treat spoofing as binary classification, overlooking that diverse spoofing techniques manipulate multiple, coupled speech attributes and their semantic effects. In this paper, we introduce HoliAntiSpoof, the first audio large language model (ALLM) framework for holistic speech anti-spoofing analysis. HoliAntiSpoof reformulates spoofing analysis as a unified text generation task, enabling joint reasoning over spoofing methods, affected speech attributes, and their semantic impacts. To support semantic-level analysis, we introduce DailyTalkEdit, a new anti-spoofing benchmark that simulates realistic conversational manipulations and provides annotations of semantic influence. Extensive experiments demonstrate that HoliAntiSpoof outperforms conventional baselines across multiple settings, while preliminary results show that in-context learning further improves out-of-domain generalization. These findings indicate that ALLMs not only enhance speech spoofing detection performance but also enable interpretable analysis of spoofing behaviors and their semantic effects, pointing towards more trustworthy and explainable speech security. Data and code are publicly available.
翻译:语音合成与编辑技术的最新进展使得语音欺骗日益具有挑战性。然而,现有方法大多将欺骗视为二元分类问题,忽略了多样化的欺骗技术会操纵多个相互耦合的语音属性及其语义影响。本文提出了HoliAntiSpoof,首个用于整体语音反欺骗分析的音频大语言模型框架。HoliAntiSpoof将欺骗分析重新定义为统一的文本生成任务,从而能够对欺骗方法、受影响的语音属性及其语义影响进行联合推理。为支持语义层面的分析,我们引入了DailyTalkEdit,这是一个新的反欺骗基准数据集,它模拟了真实的对话操纵并提供了语义影响的标注。大量实验表明,HoliAntiSpoof在多种设置下均优于传统基线方法,同时初步结果显示,上下文学习能进一步提升其域外泛化能力。这些发现表明,音频大语言模型不仅能提升语音欺骗检测性能,还能实现对欺骗行为及其语义影响的可解释分析,为构建更可信、可解释的语音安全系统指明了方向。数据与代码已公开。