CompSpoof: A Dataset and Joint Learning Framework for Component-Level Audio Anti-spoofing Countermeasures

Component-level audio Spoofing (Comp-Spoof) targets a new form of audio manipulation where only specific components of a signal, such as speech or environmental sound, are forged or substituted while other components remain genuine. Existing anti-spoofing datasets and methods treat an utterance or a segment as entirely bona fide or entirely spoofed, and thus cannot accurately detect component-level spoofing. To address this, we construct a new dataset, CompSpoof, covering multiple combinations of bona fide and spoofed speech and environmental sound. We further propose a separation-enhanced joint learning framework that separates audio components apart and applies anti-spoofing models to each one. Joint learning is employed, preserving information relevant for detection. Extensive experiments demonstrate that our method outperforms the baseline, highlighting the necessity of separate components and the importance of detecting spoofing for each component separately. Datasets and code are available at: https://github.com/XuepingZhang/CompSpoof.

翻译：组件级音频欺骗（Comp-Spoof）针对一种新型音频篡改形式，即仅伪造或替换信号中的特定组件（如语音或环境声），而其他组件保持真实。现有的反欺骗数据集与方法将整个话语或片段视为完全真实或完全伪造，因此无法准确检测组件级欺骗。为解决此问题，我们构建了一个新数据集CompSpoof，涵盖真实与伪造语音及环境声的多种组合。我们进一步提出了一种分离增强的联合学习框架，该框架将音频组件分离，并对每个组件应用反欺骗模型。通过采用联合学习，保留了与检测相关的信息。大量实验表明，我们的方法优于基线，突显了分离组件的必要性以及对每个组件分别进行欺骗检测的重要性。数据集与代码公开于：https://github.com/XuepingZhang/CompSpoof。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【博士论文】面向真实世界音视联合语音识别的可扩展框架

专知会员服务

13+阅读 · 2025年12月19日

《用于语音取证和高超音速飞行器应用的机器学习》200页

专知会员服务

19+阅读 · 2024年3月28日

【CVPR2024】SNIFFER：用于可解释的脱离上下文谣言检测的多模态大型语言模型

专知会员服务

19+阅读 · 2024年3月6日

《网络战中的深度伪造：威胁、检测、技术和对策》

专知会员服务

49+阅读 · 2023年11月22日