EchoFake：面向实际语音深度伪造检测的回放感知数据集 (EchoFake: A Replay-Aware Dataset for Practical Speech Deepfake Detection)

The growing prevalence of speech deepfakes has raised serious concerns, particularly in real-world scenarios such as telephone fraud and identity theft. While many anti-spoofing systems have demonstrated promising performance on lab-generated synthetic speech, they often fail when confronted with physical replay attacks-a common and low-cost form of attack used in practical settings. Our experiments show that models trained on existing datasets exhibit severe performance degradation, with average accuracy dropping to 59.6% when evaluated on replayed audio. To bridge this gap, we present EchoFake, a comprehensive dataset comprising more than 120 hours of audio from over 13,000 speakers, featuring both cutting-edge zero-shot text-to-speech (TTS) speech and physical replay recordings collected under varied devices and real-world environmental settings. Additionally, we evaluate three baseline detection models and show that models trained on EchoFake achieve lower average EERs across datasets, indicating better generalization. By introducing more practical challenges relevant to real-world deployment, EchoFake offers a more realistic foundation for advancing spoofing detection methods.

翻译：语音深度伪造的日益普遍引发了严重关切，尤其在电话诈骗和身份盗窃等现实场景中。尽管许多反欺骗系统在实验室生成的合成语音上表现出良好性能，但在面对物理回放攻击——一种实际场景中常见且低成本的攻击形式时，它们往往失效。我们的实验表明，在现有数据集上训练的模型性能严重下降，在回放音频评估中平均准确率降至59.6%。为弥补这一差距，我们提出了EchoFake，这是一个包含超过13,000名说话者、总计120多小时音频的综合性数据集，其特点是既包含前沿的零样本文本到语音（TTS）语音，也包含在不同设备和真实环境设置下采集的物理回放录音。此外，我们评估了三种基线检测模型，结果表明在EchoFake上训练的模型在跨数据集评估中实现了更低的平均等错误率（EER），显示出更好的泛化能力。通过引入更多与实际部署相关的现实挑战，EchoFake为推进欺骗检测方法提供了更真实的基础。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日