Existing speech anti-spoofing benchmarks rely on a narrow set of public models, creating a substantial gap from real-world scenarios in which commercial systems employ diverse, often proprietary APIs. To address this issue, we introduce MultiAPI Spoof, a multi-API audio anti-spoofing dataset comprising about 230 hours of synthetic speech generated by 30 distinct APIs, including commercial services, open-source models, and online platforms. Based on this dataset, we define the API tracing task, enabling fine-grained attribution of spoofed audio to its generation source. We further propose Nes2Net-LA, a local-attention enhanced variant of Nes2Net that improves local context modeling and fine-grained spoofing feature extraction. Experiments show that Nes2Net-LA achieves state-of-the-art performance and offers superior robustness, particularly under diverse and unseen spoofing conditions. Code \footnote{https://github.com/XuepingZhang/MultiAPI-Spoof} and dataset \footnote{https://xuepingzhang.github.io/MultiAPI-Spoof-Dataset/} have released.
翻译:现有的语音反欺骗基准测试依赖于有限的公共模型集合,这与现实场景存在显著差距,因为商业系统通常采用多样化的、通常是专有的API。为解决这一问题,我们提出了MultiAPI Spoof,一个多API音频反欺骗数据集,包含约230小时由30个不同API生成的合成语音,涵盖商业服务、开源模型和在线平台。基于该数据集,我们定义了API溯源任务,实现对欺骗音频生成来源的细粒度归因。我们进一步提出了Nes2Net-LA,即Nes2Net的局部注意力增强变体,该模型改进了局部上下文建模与细粒度欺骗特征提取能力。实验表明,Nes2Net-LA取得了最先进的性能,并展现出卓越的鲁棒性,尤其在多样化和未见过的欺骗条件下表现突出。代码\footnote{https://github.com/XuepingZhang/MultiAPI-Spoof}与数据集\footnote{https://xuepingzhang.github.io/MultiAPI-Spoof-Dataset/}已公开发布。